This article originally appeared on the BeyeNETWORK.
It is a tenet of data warehousing that historical data has business value. But no one ever stops and asks—“What business value and for how long?” So let’s examine the business worth of historical data and how long it holds business value.
At the almost trivial level, historical data has value as backup and recovery. If a business loses or has destroyed data, then having a historical record can be quite useful. But this use of historical data is at the system level—the technology—more than at the business level.
The most obvious case for historical data becoming useful to the business person is for complementing customer relationship management. People talk about getting to know the customer. And it is simply true that the customer is a creature of habit around the world. Knowing your customers historical habits is a very powerful thing because once you know a person’s habits you now know his or her future. You can predict much of an individual’s future by knowing an individual’s past. But how much of a person’s past do you have to know?
The sales and marketing department is particularly well positioned to use historical data when it has been wrapped around the customer.
Another perceived use of historical data is to develop trend lines. It is sometimes said that historical data lets you see the forest as well as the trees. Going back in time allows you to discover long-term trends. And how far backward are trend lines useful? Finance and accounting departments are positioned especially well to use these long-term trend lines that can be developed.
When historical data is stored at a granular level (as is the case for the data warehouse), the analyst has a lot of flexibility. Not only can the analyst go back in time, but the analyst can use the granularity of data to develop historical views that have never been explored. This perspective of data and this use of data can be useful to quite a few people—management, litigation support, marketing, sales, and so forth. But how far backward do granular trend lines make sense?
How long should one keep historical data? Is historical data useful ad infinitum or is there a point where historical data loses its business value?
Consider historical data that is 100 years old. Who could use historical data that old? Certainly not sales, because the customers from 100 years ago are not alive today. Certainly not the business person because business has changed so fundamentally over those years that any conclusions would be drawn on such a different basis compared to today.
About the only business use of the 100-year-old data is that of the economist. With 100-year-old data the economist can look at such things as inflation, prices, bubbles, crashes, prosperity and depression. The problem is that there are very few business economists. Only the very largest companies can afford the services of an economist. Most economists work for the government. So the argument for business value of data that is 100 years old at best.
If 100 years of history is too much, what about 20 years of history? Twenty years of history is within the memory and lifespan of most people, so it is not completely out of range. What business uses are there for 20 years of history?
The CRM contingent can argue that 20-year-old data may be useful. Twenty years is probably the oldest that such data might be useful. Management might develop trends that go back that far. There is a case for 20-year-old data, but the case is sketchy at best.
Now let’s go to 10-year-old data. Certainly the CRM contingent can use 10-year-old data. Ten-year-old data really starts to form a description of a customer. Trends over time that are 10 years old are not questionable at all. Litigation support going back 10 years in time would be welcomed by most corporate lawyers involved in a lawsuit.
So 10 years of data is the threshold of usefulness.
Now what about five-year-old data? Five-year-old data suits everyone’s business needs. Five-year-old data is highly useful.
From five-year-old data to one-year-old data—the scale of usefulness only increases with each year.
The business value of data then forms a Poisson distribution. The Poisson distribution describes the worth of business data over time to the business person.
Bill is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations. Bill can be reached at 303-681-6772.
Editor's Note: More articles, resources and events are available in Bill's BeyeNETWORK Expert Channel. Be sure to visit today!