This article originally appeared on the BeyeNETWORK.
Okay, put your books away. Take out a paper and a pencil. No looking at anyone’s answer. You are on your own.
Read the following two definitions:
- A data warehouse is a: subject-oriented, integrated, nonvolatile and time-variant collection gathered for the purpose of management’s decisions.
- An active data warehouse has: 24x7 availability, sub-second response time and update of data, giving the organization the ability to incorporate intelligence into transactions.
Now here’s the test. Is an active data warehouse really a data warehouse?
The given definition of a data warehouse was taken from the first book ever written about data warehouses. The definition is established and remains the same as it has for almost twenty years now.
The second definition was taken from a vendor describing an active data warehouse. There is something terribly out of phase with these two statements. Even though the two descriptions both purport to be talking about data warehouses, it is the first definition that declares that a data warehouse is in support of management’s decisions. That was (and still is) the original definition of a data warehouse. But look at the description of an active data warehouse. It calls for 24x7 availability and sub-second response time. It calls for update of data. Who needs these characteristics of a system? Certainly not management. It is the clerical community that needs these characteristics. So a data warehouse and an active data warehouse are designed to serve entirely different communities. They are not the same thing at all.
In order to better understand the differences, consider the different kinds of decisions supported by each of these different architectural structures. What kinds of decisions does management make? Management makes strategic decisions such as:
- Should we expand into Northern California next year?
- Should we discontinue one of our product lines?
- Should we try a different packaging of our products?
These are simple examples of strategic decisions. Now, how important is response time in doing analysis in support of these decisions? How important is 24x7 availability? How important is it that management be able to update data?
The answer is that these capabilities are not important to management at all. If anything, these capabilities get in the way of good decision making. Management’s decisions about what direction to move the company are completely divorced from the performance of the sub-second query or from 24x7 availability.
And what kind of decision is the clerk making? The clerk is making decisions about such things as:
- Is there enough money in the account to honor a check?
- Is there a seat available on flight 562 next Wednesday?
- How should I price this SKU given that the store closes in two hours?
The clerk makes an entirely different kind of decision than the manager does. Clerical transactions serve an entirely different purpose. An active data warehouse serves the clerical community, not the managerial community.
So what is the problem here? The problem is that the definition of a data warehouse long predates the definition of an active data warehouse. There simply is no question which came first – the definition of the data warehouse came first. So what have vendors of active data warehousing and real time data warehousing done? Quite frankly, they have taken a popular concept and told people that it is a data warehouse they are buying when, in fact, what the vendors of active data warehousing are selling is not a data warehouse at all. They have used the term data warehouse to sell their technology when their technology is not a data warehouse. Don’t get me wrong – the technology they are selling has its place. It is just that it is not a data warehouse. It is something else.
And what has the effect been? The effect has been a lot of confusion among the consumer community. In a fit of candor, if the vendors would have named their technology real time integrated data or active integrated data, then the world would not be as confused as it is today.
One needs to ask the question, “Do the vendors of real time integrated data not know that they are not building a data warehouse?” That is a scary thought because their own architects should have enough savvy to know better. An even scarier thought is that their architects do know better but have decided to cash in on a popular concept at the expense of a gullible consuming public.
Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.