This article originally appeared on the BeyeNETWORK.
Assembling a master data management program, by necessity requires the consolidation of data into a data management framework that enables the consistent view of the uniquely identifiable objects used across the application architecture. However, as there are different approaches to managing the unique representation, we may find that there are characteristics associated with maintaining the synchronization and consistency of master data that impose implementation constraints when implementing master data consolidation and integration services.
For example, in a full master repository in which the data sources are combined into a single copy (such as the transaction hub style), by fiat, for each of the applications, all the data is synchronized since there is only one copy. On the other hand, in a thin registry architecture, records maintaining bits and pieces of master data are sprinkled across a federated data environment, in which case there will be situations in which local copies of master data associated with specific application silos are inconsistent. In fact, under the hood, the actual implementation of any master data management (MDM) architecture along the spectrum between registry and full repository may distribute and replicate master copies, in which case it is also subject to inconsistency at some point during operations.
In order to determine which architectural style is appropriate, it will be necessary for the MDM architect to assess the enterprise applications’ requirements for master data synchronization. Some operational environments may be very tolerant of inconsistency, allowing for batch consolidation on a periodic (e.g., nightly) basis, while others may require a high degree of consistency requiring immediate synchronization. Assessing the application environment suggests reviewing the application requirements for synchronization based on these synchronization dimensions:
- Timeliness – ensuring the timely availability of master data, or specifying the enterprise-wide expectation for newly introduced data fully integrated and available within the master environment;
- Latency – modulating the time it takes to deliver requested master data, as a way of monitoring application performance;
- Currency – ensuring “freshness” of master data;
- Consistency – the degree to which each application’s view is not different from any other application’s view;
- Coherence – maintaining synchronization of the views of master data managed within local copies;
- Determinism/Idempotence – asserting that issuing the same request for data results in same answer each time.
We can assess the degree of synchronization associated with each of the master data management architectural styles. In the registry master data management architecture, a thin master index maintains identifying information along with pointers to the data sources in which records referring to that master data object reside. Newly introduced data may be registered within the index and is available at the operational level as soon as it is persisted into the owning application’s data resource; therefore, timeliness is high, as is data currency and, accordingly, latency is low. However, since each application may have distinct records for each master object, the degree of consistency is low, and the variance between local copies also means that coherence will be low. In addition, for accesses to a conformed master record that is materialized and consolidated on demand, each request may result in slightly different views, so we might rate its determinism as low.
For the full repository, there is only one master copy. Newly introduced records are committed directly to the master repository; therefore, timeliness and consistency are high, as is the assurance of consistency across applications. In the absence of local copies, there is little concern regarding coherence, and accesses to the same record should almost always return the same result, so determinism is high.
For the hybrid model, in which a registry is augmented to hold some collection of master attributes that may be copied back to application data environments, it is difficult to assess any of these dimensions without a greater understanding of the implementation. A federated model may essentially reflect a cache paradigm, in which applications make copies of what is in the master repository;and as modifications are made to the local copy, they are forwarded back to the master. Actually, the issues associated with the hybrid model’s synchronization (as well as federated implementations of the transaction hub) are interesting enough to warrant deeper exploration, which we will target in an upcoming article.
However, the real issue is not the variant synchronization attributes of the MDM architecture styles, but the determination of business application horizontal requirements for the synchronization dimensions. This doesn’t look at consistency of data within one processing stream, but the interaction of applications with respect to the set of the same master objects. Of course, this cannot be done in a vacuum and requires a generalized view of the entire business process architecture prior to making any kind of architecture implementation decision.
David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management:The Data Quality Approach and Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information.