This article originally appeared on the BeyeNETWORK.
One of the complaints about the Corporate Information Factory (CIF) is that there is data found everywhere. There is data in the legacy environment. There is data in the data warehouse. There is data in the data mart. There is data in the archival environment. It simply is true that there is data everywhere.
And with the complaint of data being everywhere, there is the complaint that the data is all the same. The name Bill Inmon is found in the legacy environment. The name Bill Inmon is found in the data warehouse. The name Bill Inmon is found in the data mart. So with all this redundant data, aren’t we spending an exorbitant and unnecessary amount of money?
If, in fact, data really were redundant across all the parts of the CIF, then the complaint is well founded. But, the reality is that the data IS NOT the same across the different parts of the CIF. There is a fundamental transformation of data as it passes from one part of the CIF to another.
Let’s examine these rites of passage.
When data first enters the system, it enters in the form of an application. The application has its own peculiarities and idiosyncrasies. As a rule, each application has its own transactions, its own definitions, its own code. The data is captured entirely in the context of the application.
Then the data passes into the data warehouse, and there is a very significant transformation that occurs. Data passes from application-oriented data to corporate data. Data is integrated as it enters the data warehouse. This integration can take many forms – reformatting of data, recalculation of data, summarization of data, alteration of the key structure, and so forth. While the data may or may not change, the context of the data changes dramatically.
Then data passes into the data mart environment and another fundamental change occurs. In this case, the data is reshaped according to the needs and desires of the department for whom the data mart is being built. In this case, detailed data is aggregated, summarized, and otherwise prepared for departmental analytical processing.
In order for these transitions to make sense, let’s look at an example.
At the transaction level, an order is placed for lots of goods. At the order level, the amount of the order and the goods are stored, just as they were entered into the transaction. The total order amount is $10,000. Next, the data is placed in the data warehouse. Some of the items are to be delivered as soon as possible and other items are to be delivered in a few months. Because of accounting rules, the order is split in two – one part of the order is for goods to be delivered this month, and the other part of the order is for goods to be delivered three months from now. The $10,000 on the order is split into two parts – one for $6,000 and another for $4,000.
Now data is to be sent to the data mart. The engineers only want to see a certain kind of good that has been ordered – a widget. The original order for widgets was to be made for delivery this month. The month’s order for widgets is for $1,500.
While there is certainly integrity of data across the architecture, it should be clear from this example that there is different data in different places with completely different context.
People that complain that there should be only one occurrence of data have a hard time accounting for the different context of information.