This article originally appeared on the BeyeNETWORK.
Everyone believes that data warehouses are expensive. We attend conferences and swap stories about multi-terabyte data warehouses and the conclusion that we reach is that data warehouses are expensive. But does it have to be that way? Stated differently, what is the most expensive, consuming part of the data warehouse? What is the largest determining factor for the expense of the data warehouse? You may be surprised at the answer. The answer is that the largest determining factor for the data warehouse is—you!
So how does the designer/architect shape the cost of the data warehouse? The designer/architect has several ways to impact the amount of money spent on the data warehouse.
Consider the choice of methodology. There is one way to dramatically increase the organization’s cost, and that is to choose the wrong development methodology. There are essentially two methodologies for constructing the data warehouse. There is the Ed Yourdon/Tom DeMarco waterfall SDLC methodology and there is the spiral methodology. The classic waterfall methodology—which most people have been taught—is just fine for operational systems where the requirements of the system are known before the system is developed. However, in an environment where the end user operates in a mode of discovery—where requirements ARE NOT KNOWN before development begins—using the classical SDLC is disastrous and costly.
Then, there is the issue of integration and transformation. To build the data warehouse, you need to take raw application data and turn it into corporate data. If your data warehouse is large you will definitely need an ETL tool for this task. The ETL tool will generate the code automatically. Choosing to build this interface manually where there are more than a few programs involved is asking for budgetary woes. Choosing to build a manual ETL interface can be compared to walking into a boxing ring and sticking out your chin.
Another way to spend a lot of money carelessly is to build a data warehouse and then not monitor it. It is amazing how many companies have built large data warehouses and have no idea what takes place in them. They don’t know what data is being used. They don’t know how data is being used. They also don’t know who is using the data. Surprisingly they don’t know what kind of SQL is being generated from their front-end tools. They sit and complain about their ever-expanding hardware bill. They are driving blindfolded and complaining that they are hitting walls. They need to take the blindfold off in order to see what is in front of them.
But perhaps the largest waste of all is in the type of media on which data warehouses are placed. Approximately 99.99 percent of all data warehouses are placed on disk storage. People just assume that the proper storage medium for data warehouses is disk storage. For small- to medium-sized data warehouses, disk storage works just fine. But for large data warehouses, placing all data on disk storage is a really bad idea. This is because, for a large company, the vast majority of data is used infrequently. It is also a bad choice, because a large amount of disk storage is expensive. So large data warehouses placed on disk storage can waste a colossal amount of your budget.
Try this quiz to measure your efficiency. Start yourself with zero points.
- Did I use a spiral development methodology? If yes, add 15 points.
- Did I use an ETL tool? If yes, add 10 points.
- Did I use a data warehouse monitoring tool? If yes, add 10 points.
- Did I place older, less frequently used parts of my data warehouse on non-disk based storage? If yes, add 25 points.
60 points—Your data warehouse environment is well run and cost efficient.
50 points—You are doing OK. A little improvement is necessary.
40 points—You could do better. You are costing your company money.
30 points—Touch up your resume. You are a liability to your company.
20 points—Take some courses in basic data warehouse design and management.
10 points—Don’t let management find out about you.
0 points—Have you considered another profession?
Bill Inmon is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.