This article originally appeared on the BeyeNETWORK.
For years the definition of metadata has been—“data about data.” This definition is as widely accepted as any definition in our industry. But this definition has always been difficult to understand.
In college (in philosophy) when we studied definitions (admittedly I made a gentleman’s C in the course) we found that a definition needed to have several characteristics. A word being defined could not be used in the definition (i.e., a wild flower is a flower that grows in the wild). A definition was required to separate what was being defined from anything else. A definition told you how to identify whatever was being defined.
When applying this test, the definition—metadata is data about data—seems to be awfully shallow and doesn’t seem to tell you what you need to know.
Perhaps another way of describing or defining metadata is thinking of metadata as the shadow of data. This aphorism seems to make more sense than the simplistic definition which we have been using.
So what is appealing about this way of thinking about metadata?
First, a shadow cannot exist without some tangible object. And metadata cannot exist without some data. Secondly, wherever the object goes, the shadow follows. The same is true with metadata. Wherever the data goes, so goes the metadata.
But the aphorism fails when it reaches mass. An object has mass, a shadow does not. Metadata has mass, which is not true of a shadow.
Even so, metadata typically has much less mass than the data it is used to describe, so the analogy—while not quite perfect—is still roughly applicable.
There is one place where the analogy breaks down. That place is where metadata can be used to track and describe data well after the data is gone. In other words we can use metadata to create a long historical journal of data long after the data has disappeared. This is roughly the equivalent of the special effects image of an object quickly going through time, leaving a small footprint every where it has been. (This special effect is really hard to describe. But it is seen on television every day.)
One of the aspects of a shadow is a sun or other source of light. You can’t have a shadow unless you have a source of light. The same is true of metadata. You need to have someone interested in looking at the metadata before the metadata becomes useful and apparent. (If a shadow falls in a forest and no one is there to see it, did the shadow really appear? If metadata exists in a technological environment and no one is there to use or look at it, is it really metadata?) If no one is there, does it matter?
These are some of the ways of thinking about metadata that are a little bit different, admittedly imperfect, but certainly more descriptive than “metadata is data about data.”
Bill Inmon is universally recognized as the father of the data warehouse. He has more than 36 years of database technology management experience and data warehouse design expertise. He has published more than 40 books and 1,000 articles on data warehousing and data management, and his books have been translated into nine languages. He is known globally for his data warehouse development seminars and has been a keynote speaker for many major computing associations.