This article originally appeared on the BeyeNETWORK.
One of the proposed benefits of a master data management (MDM) program is the ability to provide a “round-trip” integration process for all of your master data objects. Having a master repository as a “source of truth” seems very appealing. However, in some conversations I have had with folks implementing a master data management or customer data integration program, it appears that while the promise of reverse integration of master data is desired, the perception of its feasibility seems to be relatively challenging. The main reason? Standardization.
One of the integration constraints for assembling and consolidating master data from multiple sources is the existence of variant model representations for data elements and data values. For example, a telephone number field in one database may be specified as a 10-digit numeric value, in another database as a 16-character alphanumeric field, and in assorted other file-based data sets without any specific characterization of data type. In order to consolidate the values stored in these different data sets, we must be able to do a few tasks:
- Assess the different core data types and lengths for the various data fields. This can be done using a profiling tool and collecting the discovered metadata into a repository for evaluation.
- Identify the key common denominator canonical data type. To consolidate all the data values into one data attribute, there must be a common type representation that has the capacity to store every data value. For example, to accommodate both numeric fields of size 10 and alphanumeric fields of size 16, the target data field has to be an alphanumeric of size 16. This canonical type becomes the model representation in the master repository for that data element.
- Provide a transformation process for all source data values. 10-digit numeric values must be converted into character strings before they can be stored in the target data attribute, and the same is true for any source data value. In addition, any additional representation constraints (e.g., blank-padding or leftmost zero extension) must be observed.
Items 1 and 2 are necessary to build the target master data model, while item 3 is required to move the data into the master repository. And with the master data in the repository, it is easy to build client applications that use the master data resource. However, trying to reverse integrate the master data back into the original sources is an issue when the canonical representation is more comprehensive than the original source model, since there may not be an effective one-to-one mapping that truncates or eliminates precision that preserves the transformation.
Put more simply, the master data element can hold values that may not fit into the source model. That 16-character telephone number will not fit into a 10-digit numeric column, unless some truncation or compression is applied. To be able to facilitate data sharing, each unique record must be representable within any client systems, whether it is a newly developed application or a legacy one. Yet if the source model cannot hold all of the values managed within the master repository without a lossy transformation, then there will be no way to refer back to the master system. Consider these example data values:
“(888) 234-1030 x3445”
“(888) 234-1030 x3447”
Clearly, these refer to two different telephone numbers because they contain different extension values. Yet converting them into a 10-digit format will reduce both of these identifiers into “8882341030.” The differentiating extension data is lost, which in turn prevents using this value and a unique key back into the master table. In the presence of an alternate key, this is not an issue, but the fact that a uniquely referable piece of identifying data cannot be used challenges the MDM concept. In this case, the master data does not provide distinct and unique master data that can be integrated back into the application infrastructure.
So how do we account for this conceptual discrepancy? The answer lies in the fact that we may have different expectations for MDM as a basis for reverse integration than what should be appropriately deployed. The “silver bullet” notion that we can immediately plug an MDM back end into our existing applications is misguided, but considering the alternative provides a deeper understanding of the value of the MDM program.
To truly be able to integrate an MDM back end into an existing application, we must reconsider the application architecture and explore an alternate view to managing the application framework. The goal is not to try to directly plug in another database, but to understand how your applications deal with master data objects, to determine whether there are overlapping functional needs that can be coordinated through a master repository, and to replace the embedded functionality with a services-oriented approach. Not only can the common functionality (e.g., new record creation, updates, etc.) be provided through a service layer, but the conceptual activities for which master data records are used (e.g., performance tracking, profiling, embedded analytics, etc.) can be provided through the service layer and thereby maintain consistency both at the data and at the functional level.
Of course, this process requires a lot more planning and governance than just consolidating your enterprise data sets. On the other hand, if all you are doing is aggregating data for new application purposes, your MDM application is at risk of becoming yet another unsynchronized data silo.
David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management:The Data Quality Approach and Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information.