What issues -- both technical and cultural -- do we need to consider before building a data warehouse for our...
How long have you got?
General issues (not really technical or cultural, but very important) include, but are not limited to:
- What kind of analysis do the business users want to perform?
- Do you currently collect the data required to support that analysis?
- Where is the data?
- How clean is your data?
- Are there multiple sources for similar data?
- What structure is best for the core data warehouse (i.e., dimensional or relational)?
Technical issues include, but are not limited to:
- How much data are you going to ship around your network, and will it be able to cope?
- How much disk space will be needed?
- How fast does the disk storage need to be?
- Are you going to use solid-state drives (SSDs) to store “hot” data (i.e., frequently accessed information)?
- What database and data management technology expertise already exists within the company?
Cultural issues include, but are not limited to:
- How do data definitions differ between your operational systems? Different departments and business units often use their own definitions of terms like “customer,” “sale” and “order” within systems. So you’ll need to standardize the definitions and add prefixes such as “all sales,” “recent sales,” “commercial sales” and so on. But be warned: Business managers generally don’t like changing the established semantics.
- What’s the process for gathering business requirements? Some people will not want to spend time defining their data warehousing requirements for you. Instead, they will expect you to use your telepathic powers to divine their warehousing and data analysis needs.
If the last item sounds like a cynical description of other people’s stupidity that I’m using to indirectly highlight my own razor-sharp intellect, it really isn’t. In the past, I’ve done exactly the same thing when specifying jobs. For example, I might say to a builder, “Look, I just want a shed, OK? It can’t be too difficult to build a shed! You must have built dozens in the past.” So, the builder builds one, and I say, “How can you have put such a small door on a woodshed?!?!”
The experience of gathering user requirements has (hopefully) made me much more sympathetic in dealing with, say, builders. I now try hard to devote the time needed to specify exactly what I want. But I see this facet of human behavior as perfectly normal: People in general (including me) don’t want to be tied down, they just want something created that magically matches their requirements.
As a whole, the above list of issues has only just scratched the surface -- which, of course, gives the impression that building a data warehouse is very difficult and a pain. It is challenging, but it is a fabulous project to be involved in, because when data warehouses work properly, they are magnificently useful, huge fun and unbelievably rewarding. And the fact that you are asking about the cultural issues as well as the technical ones is very encouraging. It is my experience that the technical issues can almost always be overcome; the cultural ones are what often cause data warehousing projects to fail.
I’ll leave you with one more issue to consider (one of the more important, in my opinion): politics. By that, I mean when people lie about their actions and/or motivations.
For example, the head of finance might see a data warehouse as a threat. Without one, they control the flow of information from the finance department, and control is power. A data warehouse threatens their power base. But such a person is highly unlikely to stand up and say that in a board meeting. Instead, they might say, “This data warehouse is too expensive.” Worse, they might say, “Data warehouse? It’s a great idea; I fully support it!” -- and then spend the next six months quietly trying to torpedo it.
Beware the politics of a data warehousing project. After all, a data warehouse is a system designed to share information across the entire organization. If it does that in a satisfactory way, it will inevitably tread on some toes.
Dig Deeper on Data warehouse software
Related Q&A from Mark Whitehorn
Here's a guide to primary, super, foreign and candidate keys, what they're used for in relational database management systems and the differences ... Continue Reading
The unstructured data types common in big data systems are often better managed by a NoSQL database than relational software, Mark Whitehorn says. Continue Reading
Analytics expert Mark Whitehorn explains the strengths of R and how to determine if the open source programming language fits your analytics purposes. Continue Reading