This article originally appeared on the BeyeNETWORK.
Certain bad pennies just keep coming back. I don’t know how they do it or why they do it, but they just do. You look in your change as you put it on your desk when you go to bed at night, and the same old rotten penny just keeps coming back.
One of those rotten pennies is the big-bang approach to data warehouse development. The big-bang approach involves going out, gathering all the requirements, synthesizing those requirements, designing and programming – just like the structured analysis and programming boys told us to do in the mid 1960s.
That approach to development may have been just the ticket in the 1960s and the 1970s, for that matter. But back then, we were building operational systems. Back then, we had a shot at knowing who the users were and what their expectations were. But in the world of analytical systems, that approach does not stand a prayer. The problem with the big-bang approach to development – the SDLC (systems development life cycle) – is that it is associated with disastrous attempts to build data warehouses.
Stated differently, if you want to greatly raise the risk of failure for a data warehouse development, then go ahead, make my day – go for the big-bang approach. The big-bang approach raises the risk factors for a data warehouse failure by an order of magnitude.
So why does this bad penny keep coming back? There probably are a multitude of reasons, each of which is a contributing factor. Some of the reasons probably are:
Arthur Andersen’s Method 1.A long time ago, the consulting company Arthur Andersen developed a famed methodology – Method 1. In a way, this was the granddaddy of all methodologies. Certainly no other company built and promulgated a development methodology like Arthur Andersen. And Method 1 worked. At least it worked for operational systems, where requirements can be gathered before the system is built. But Arthur Andersen either didn’t know or didn’t care that its methodology had some inherent limitations. Method 1 does not work for analytical systems.
What your professors taught you in school.Most universities that have a computer science curriculum teach methodology. And the methodology that they teach is an offshoot or derivative of the old structured programming SDLC. So you merely take what you learned in school and apply it to the systems that you are building. What could be more natural than that? The answer is that it may seem natural, up to the point that you realize that it is a failure.
A hangover from the Y2K days.What seems like a long time ago now was not so long ago. Once there were computer consulting firms that had a massive number of bodies engaged in a Y2K conversion. When the year 2000 came and went, these consulting firms were looking for the next big project. It seemed natural to dump all of these bodies into data warehouse development. And the only way to justify all these bodies on a project was to put out a really big proposal. And what better vehicle for a large proposal than a big-bang data warehouse development approach? Consulting companies love the big-bang approach because it uses so many bodies, for which the consulting firms are well paid.
And there probably are many more reasons why the big-bang approach has been used, and has failed, over and over again.
Why does the big-bang approach fail? There are lots of reasons for the failure of the big-bang approach, but the primary reason for failure is that end users cannot tell you what the actual system requirements are before the system is built. End users of analytical systems are constantly changing their minds. End users of analytical systems operate in a mode of discovery. The end user of analytical systems has the attitude of, “Give me what I say I want, then – and only then – can I tell you what I really want.” End users of analytical systems need to know what the possibilities are before they can articulate the requirements.
Furthermore, once the data warehouse has been built, there is a whole new set of end users that find the data warehouse useful. These users were never consulted about their requirements when the data warehouse was being built.
The end user is not a stupid or uncooperative person. It is just that the end user of analytical systems operates in a fundamentally different mode than the user of operational systems. And the mode of thinking of the end user of analytical systems is just not a fit with the gathering of system requirements before the system is ever built.
There probably are other reasons why the big-bang approach to data warehouse development doesn’t work. But at the heart of those reasons is the inability of the development analyst to gather requirements in the manner prescribed by the SDLC.
So if the SDLC doesn’t work, what does work? The answer is that there is an entirely different approach to the development process. Sometimes called the iterative process and sometimes called the spiral development approach, data warehouses are built in short fast increments. First, one small part of the data warehouse is designed, programmed and populated. Then, another small part of the data warehouse is designed, programmed and populated, and so forth. A small, fast piece at a time is the way to build a data warehouse.
Where does one find out about a spiral development approach? Unquestionably, the best source for such material is a book written by Larissa Moss and Sid Adelman, Data Warehouse Project Management. Another book you might enjoy by Moss and Adelman is Data Strategy.
These books belong on the desk of every data warehousing professional. It is really unusual for there to be a problem as pervasive and as profound as the incorrect development methodology being used when there is an answer available in a book. Simply stated, the advice in the books by Adelman and Moss is cheap – especially when you compare it to the cost of a data warehouse failure.
Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.