This article originally appeared on the BeyeNETWORK.
Even in the early days of centralized mainframe systems (the backbone of these systems was transaction processing systems) there was a need for metadata. In the centralized environment, there was a need to determine where data came from, what data meant, how data was calculated and so forth.
The world today is decidedly architecturally different. In today’s world, there is distribution of processing, and there are departmental servers, personal computers, mainframe application processors and analytical decision support servers.
With this distributed architectural landscape comes a different need for metadata. In addition to those basic needs that we had with yesterday’s centralized architecture, there is the need for metadata to serve as a “glue” or “fabric” that ties together the different parts of the distributed architecture.
The distributed architecture of today is not one that operates on stagnant data. Data is constantly flowing throughout the corporation from operational systems to ETL, from ETL to a data warehouse, from a data warehouse to a data mart and so forth. It flows throughout the entire distributed environment. In a sense, data becomes the blood that pumps through the distributed architecture keeping it alive.
It is metadata that describes the input from any one distributed processing source as it goes into another distributed processing module. It is through metadata that one distributed module of processing knows what data has just arrived and what that data means.
There is then a very profound role played by metadata in the distributed processing environment of today – that of the definition of the interface to and from one distributed processing module to another.
This simplistic view of the role of metadata, however, is fraught with problems. Some of the issues found in the interfacing of two distributed processing modules are:
- naming conventions,
- physical characteristic compatibility,
- attribute definitions,
- the heritage of the data,
- calculations made in the sourcing module,
- structure of data, and
- relationships of data.
These issues are at the heart of the successful usage of metadata as an interface between two distributed processing modules.
Yet, there is another related but deeper problem.
The problem of defining and managing metadata for interfaces is a fairly basic problem when looking at just one interface. But there really is no such thing as just one interface in a distributed architecture. In reality, there are many interfaces. The next issue of metadata emerges and that problem is consistency of data and metadata across multiple interfaces. That becomes the central architectural challenge.