This article originally appeared on the BeyeNETWORK.
When designing the data warehouse, one of the most basic concepts is that of storing data at the lowest level of granularity. By storing data at the lowest level
And why does data need to be broken into low levels of granularity? The answer is that most data warehouse data comes from transactions. And typically, transactions contain data that is very denormalized. Denormalized data is at a high level of granularity.
Let’s take a look at a typical transaction. The typical transaction may have data such as:
- the date of the transaction,
- the item being purchased,
- the terms of the purchase,
- the person making the purchase,
- the location where the transaction was made,
- the price of the transaction, and
- the salesperson.
All of the data that has been brought to bear on the transaction is natural and normal. Naturally enough, the data in the transaction focuses on the transaction itself. At the same time, the data in the transaction is very denormalized.
In order to be stored usefully in a data warehouse, the data coming from the transaction is best broken apart. In breaking the data apart, a lower level of granularity is achieved.
As the transaction data is broken apart, it starts to fall into separate subject areas. Typical subject areas might be customer information, item information (SKU) and inventory, financial information, sales location information and the like.
It is because transactions are of necessity highly denormalized that breaking data into a finer granularity is the norm for the building of a data warehouse. But is there such a thing as breaking data down to too low a level of granularity? The answer is that yes, there is such a thing as breaking data down to too low a level of granularity.
Consider clickstream data. Clickstream data is that data that is generated and tracked as you enter a Web site and interact with it. Clickstream data tracks every move, every new page you go to, every time you move the cursor and so forth. There is almost an infinite level of detail in clickstream data. Clickstream data is highly granular.
Yet, the truth is that most clickstream data is at too low a level of granularity to be useful. It has been estimated that 90% or more of clickstream data is worthless. There is no business value whatsoever in tracking the fact that you moved your cursor because you spilled your coffee. So there is such a thing as data at too low of a level of granularity.
Therefore, data in the data warehouse should not be stored at the lowest level of granularity. Instead, data in the data warehouse should be stored at the lowest level of usefulness. The level of granularity must be taken down to the point where it is useful to the largest number of people and no further.
Bill Inmon is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.