This article originally appeared on the BeyeNETWORK.
There is an apocryphal story about metadata and archive processing that illustrates the unique relationship of the two environments. It seems that once there was a government store of space exploration information. As rockets entered outer space, metrics (i.e., telemetry) were sent back to the ground and recorded. This telemetry information provided much interesting scientific information about space exploration and was of great value to the scientific community. Indeed, the government built an impressive archive of the telemetric data that had been captured. The archive had state-of-the-art archival technology. The archival storage was protected. The telemetry data was classified. The metrics were protected and stored in excruciating detail.
Then, one day, someone with clearance decided that they needed to look into this archive. After fruitlessly searching through archive after archive, it was discovered that an important element of archiving had been omitted. There was no metadata to describe the different measurements of telemetry data that had been faithfully recorded and stored. Since no one knew what any of the metrics meant, it was stated that the government had created the largest and most protected collection of random numbers on earth.
It simply is true: archival data without metadata is, for the most part, meaningless data. In order to make your archival data really useful, you need to know what the archival data means.
Metadata is stored and collected today about almost every technology and software package that exists. But in almost every case, the metadata is stored separately and apart from the data that is being described by the metadata; thus, there is physical separation of the metadata from the data. While this is a normal and accepted architectural decision for non archival technology, this approach is simply unthinkable in an archival environment. Stated simply, in the archival environment, it is mandatory that the metadata be physically stored with the data that the metadata describes.
There is a good reason for tightly marrying the metadata with the data that is being described. That reason is that archival data is thought to be a time traveler. No one knows when the archival data will be used and for what purposes. No one knows who will need to use the archival data and on what technology. The odds of two or more physically separate units of data traveling together over time is almost nil. Data sets are lost. Someone doesn’t understand that data sets need to be kept together. One data set becomes corrupted. There are a hundred reasons why two physical units of data traveling apart stand a very small chance of arriving together – intact – in an unknown future. Therefore, metadata needs to be physically attached to the content that is being described when it comes to archival data.
The metadata that needs to be attached to archival stores of data includes both traditional forms of metadata and nontraditional forms. Some of the types of metadata that need to be attached to data content include:
- table descriptions,
- attribute descriptions,
- physical characteristics,
- immediate source of data,
- the date the data was archived,
- the number of records and the number of bytes,
- indexes for the data,
- who did the archiving, and so forth.
In a word, the designer building and managing the archival environment must build self-contained units where everything the future user needs is there. Nothing must be left to chance if the archival data is to stand the test of time.