This article originally appeared on the BeyeNETWORK.
There is an accepted belief in the market that data needs to be as current as possible in the data warehouse. The advocates of that belief certainly are entitled to their opinion and have been extremely vocal about it.
There is, however, another view that does not advocate that all data in a data warehouse needs to be current. This view holds that a transaction that is executed in the operational environment should remain in the operational environment for at least 24 hours to a month. After that amount of time, the transaction should then be moved to the data warehouse.
There is a very good reason for this “delay” or “hiccup” of time. The reason is that there is time for the transaction to “settle.” In order to explain this settling of data, consider the following case.
A telephone company sends out a bill at the end of the month to its customers for charges accrued during the month. The bill arrives at the customer’s house. The customer examines the bill and recognizes a charge for a phone call to India. The customer does not know anyone in India. The customer calls the telephone company to dispute the charges. The telephone company investigates the charges and finds that an operator-assisted call has been made and the calling party had been recorded incorrectly. The telephone company then makes an adjustment to the customer’s bill.
If the customer’s data had been rushed to the data warehouse, then this adjustment to the bill has to be made. Indeed it can be handled. But in any case, it is messy. Either the original bill must be updated, at which point any report previously written from the data is incorrect, or an adjustment record must be written. If an adjustment record is written, then there are more records in the warehouse, more calculations and more complicated transactions that must be made, etc.
It’s much better to wait for a month or so, and have whatever customer adjustments occur be reflected in the operational system. Then once the data in the operational environment has had a reasonable period of time to settle, the data is moved to the data warehouse. In doing so, data warehouse data is more stable and streamlined.
By moving data to the data warehouse as quickly as possible, the organization forces corrections and adjustments to be done in the data warehouse. And forcing this issue results in a data warehouse that is less than optimal.
The most accepted school of thought should be, that when transactions occur, that those transactions should not be rushed to the data warehouse but should be allowed to “settle” in the operational environment.
Bill Inmon is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.