This article originally appeared on the BeyeNETWORK.
For anyone who has been around for any length of time in the IT industry, it is intuitively obvious that data goes through a life cycle as it passes through the corporation. From the time that data enters the corporation until the time that it is discarded, data goes through several stages of life. Data is initially entered and used for high performance processing. Then, data is integrated and is used for managerial decisions. When data grows old and its frequency of access diminishes, it is archived, and finally it is discarded.
This life cycle applies to almost every organization. And people everywhere acknowledge (either explicitly or implicitly) the lifecycle. However, when it comes to technology, something gets lost in the translation. Vendors pay lip service to archival data, but do practically nothing about it.
One time-honored tradition is to place data on bulk storage, such as tape storage. Over the years, the data that has been archived ages, the oxide cracks and the data becomes unusable. (In this form, the data appears as dust in a canister, which is imminently unreadable.) Or there appears so much archival data that nothing can be found. Organizations drown in the volume of archival data. In a hundred ways, the tradition of merely placing archival data on tape then forgetting about it is tantamount to throwing the data away. For anyone with real archival needs, such an approach is simply not acceptable.
So what needs to be done with archival data in a modern and effective world?
Archival data needs to be tied up into small, easily accessible “time vaults.” The time vaults must be completely self-contained and include:
- all the data that is needed – both detailed data and summary data
- all the metadata that describes the archival data,
- the origins of the data in the time vault,
- the formula used in making calculations,
- the immediate source of the data,
- the count of the volume of the data, etc.
In addition, the time vaults should be created in small discrete units. Having one or two huge tables placed into time vaults which are then placed into archival processing is almost always a mistake.
The idea behind the time vault is to have the researcher twenty or more years from now to be independent from any other data or metadata. The idea is that the time vault will be opened in the future by unknown people for unknown reasons at an unknown time in a technological environment that has yet to be invented. If any of these constraints are relaxed, the time vault loses some or all of its value.
It is important that the time vaults exist in one physical unit of data, such as a table. If the data in the time vault is spread out over multiple physical locations, it is an assumption that one or more of the physical volumes of data will be misplaced or corrupted over time. Therefore, it is very important that when the time vaults are created that they be created as one physical unit of storage.
And, of course, as time vaults are created they are stored on a medium that can physically last a long time. It makes no sense to store archival data on a storage medium that is fragile and delicate.