This article originally appeared on the BeyeNETWORK.
The other day, I was visiting with clients and I ran across one who had a real problem. The client – a consulting company – had contracted to develop a data warehouse for a consumer at a fixed price. And it was the first data warehouse that the consulting company had ever developed.
The money had run out as had the time allotted to develop the data warehouse, and the data warehouse was nowhere near complete. And the client was demanding that the consulting company finish the contract.
The developer/consulting company was between the proverbial rock and the hard spot. To finish the data warehouse would require the consulting firm spend a great deal of their own money. The work would ultimately be done at a loss. To abandon the client would be a breach of contract and the consumer would never give a good reference to the developer/consulting company.
So the consulting company was caught between two very nasty choices.
There is a point here. NO ONE – REPEAT NO ONE – SHOULD BUILD THEIR FIRST DATA WAREHOUSE ON A FIXED-PRICE BASIS. If there ever were a case begging for problems, it is this circumstance. It is almost axiomatic that there will be unexpected problems in the building of a data warehouse, especially the first data warehouse that an organization builds.
So why is it so normal to have unexpected problems with the first data warehouse build?
The problem is that there are many, many unknowns in building a data warehouse, and each unknown can shake the buggy and send the wheels flying. What are some of these problems that can derail even the best laid plans for data warehouse development?
Historical data. The client never knows how much historical data they want. In many cases, the amount of historical data that will be needed is not decided until after the development process is complete. The problem is that the amount of historical data that is needed is one of the most important parameters for the developer. It is like building a car engine then deciding how much horsepower there will be after the engine is built.
The development methodology. A data warehouse is built under what is called a spiral development methodology. The problem is that most developers have only read about a spiral development methodology. They really and truly have no idea what to expect. This is roughly akin to reading a book about golf then going out to play a round of golf. Until you have a white ball on the ground and a club in your hand, you have no real idea what problems there are in hitting a little white ball into a cup the size of your fist. All the books in the world do not describe the challenge. The only real way to understand the challenge is to take a club in hand and head for the course. The same is true of the spiral development methodology. You really don’t understand what it is about until you are in the middle of the execution of an iteration of development.
Requirements. Far and away, the problem consulting organizations have building a data warehouse is in terms of establishing end user requirements. The problem with requirements for data warehouse projects is that they are NEVER complete. This is because end users of a data warehouse operate in a mode of discovery. They cannot tell you what they want until such time as they understand what the possibilities are. Only after they understand what the possibilities are can they tell you what the requirements are. Doing JAD sessions with end users of data warehouses is often an exercise in futility because of this fact.
So what happens is that the neophyte developer builds a data warehouse to the client’s specs. Upon delivering the data warehouse, the developer is mortified to find out that the requirements have changed. The data warehouse is redeveloped, and history repeats itself. Upon the completion of the redevelopment effort, the developer is mortified (once again) to find out that the requirements have changed.
The problem is that the funds for the project have now run out. And the client is not happy with the results.
This scenario is NORMAL and is not an exaggeration. It is what is expected.
So how do developers build fixed price data warehouses? A developer can confidently set a fixed price for a data warehouse when it will be the 50th data warehouse they will develop – or maybe the 100th data warehouse. After having built enough data warehouses, developers know what to expect. But with the first data warehouse, the developer has no real idea what to expect.
So what are some approaches to addressing the problems of shifting requirements? There are several approaches:
- Design the data warehouse to answer exactly two questions and no more. Typical questions usually center around corporate data. Typical questions might be –
- How many customers does the corporation have?
- What have corporate revenues been for the past quarter?
- Design the data warehouse to answer a fixed and specific set of queries. The queries might look like –
- What has been the largest transaction in the quarter?
- How many late payments over all accounts have there been for the past month?
- How many customers have multiple accounts?
- “Time box” the requirements for queries. When queries are time boxed, this means that as many requirements as possible can be identified in a specific period of time. Then no more requirements are allowed as system specifications until the first iteration of development is complete.
Only until the never ending nature of requirements is addressed should the developer even consider building the data warehouse for a fixed-price contract.
Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.