In this excerpt from Master Data Management and Data Governance, readers will learn about master data management...
(MDM) design and MDM deployment options. They’ll also learn about MDM hierarchy management, master data dimensions and different styles of MDM architecture.
Table of Contents
- The evolution of MDM architecture
- An introduction to enterprise architecture framework and MDM patterns
- MDM and SOA: An introduction to SOA and the benefits of SOA
- MDM design, MDM deployment options and MDM hierarchy
Architecture Viewpoints of Various MDM Classification Dimensions
As we defined in Chapter 1, MDM addresses complex business and technical problems and, as such, is a complex, multifaceted framework that can be described and viewed from various angles. The amount of information about MDM goals, benefits, design viewpoints, and challenges is quite large, and in order to make sense of various, sometimes contradictory assertions about MDM, we introduced several MDM classification dimensions that allow us to organize available information, and we discussed various aspects of MDM according to a well-defined structure. In this section, we consider the architectural implications of various classification dimensions introduced in Chapter 1, as follows:
- The Design and Deployment dimension (consumption and reconciliation architecture viewpoint)
- The Use Pattern dimension
- The Information Scope or Data Domain dimension
MDM practitioners and industry analysts see these dimensions as persistent characteristics of any MDM solution, regardless of the industry or master data domain.
MDM Design and Deployment Dimension
The Design and Deployment viewpoint addresses MDM consumption and reconciliation architecture concerns, and the resulting MDM architecture styles. Armed with the architecture framework approach, we can recognize that these “styles” represent architecture viewpoints that determine the way the MDM system is intended to be used and be kept reliably in sync with its data providers (data sources) and data consumers. These viewpoints represent an intersection of the functional and data dimensions of the enterprise architecture framework at the logical, conceptual, and contextual levels. The resulting design constructs are a direct consequence of the different breadth and depth of the MDM data model coverage. We will discuss master data modeling in more detail in Chapter 7.
The architecture styles vary in the context of other dimensions of the enterprise architecture framework, including the organizational need and readiness to create and fully deploy a new system of records about customer data. And, of course, these architecture styles manifest themselves in different service-oriented architecture viewpoints.
Let’s briefly describe the four predominant MDM architecture styles in the context of master data scope management, consumption, and reconciliation services. These styles have been introduced by several prominent industry analysts, including the Gartner Group. We discuss the implementation concerns of these architecture styles later, in Part IV of the book.
MDM Architecture Styles
- The MDM architecture, design, and deployment styles include the following:
- External reference
- Reconciliation engine
- Transaction hub
The underlying principle behind these styles is the fact that an MDM Data Hub data model may contain all data attributes about the data domain it manages, or just some attributes, while other attributes remain in their original data stores. It is logical to assume that the Data Hub can be the “master” of those master entities whose data attributes it manages or just arbitrates the entities and attributes across operational systems where the master data is created and maintained. This assumption is one of the drivers defining the MDM architecture styles. Let’s look at this issue in detail.
External Reference Style
In this case, an MDM Data Hub is a reference database pointing to all source data stores but does not usually contain actual data for a given domain – for example, customer data for a customer domain, product for product domain, and so on:
• This is the most extreme case, where a Data Hub contains only a reference to the source or system of record data that continues to reside in the legacy data stores. In this case, the Data Hub acts as a special “directory” and points to the master data that continues to be created and updated by the existing legacy applications. This design option, known as the “External Reference Data Hub,” is the least complex of the Data Hub styles.
• One of the main architecture concerns of this style is the ability of the MDM Data Hub to maintain accurate, timely, and valid references to the master data at all times, which may require design focus on a reliable, just-in-time interconnection between source systems and the Data Hub, perhaps by using an enterprise-class messaging mechanism.
• A significant limitation of this architectural style is that the Data Hub does not hold any attributes, even those needed for matching and entity resolution. The Data Hub service responsible for matching has to access matching attributes across multiple systems in a federated fashion.
Even though this design is theoretically possible and a few attempts have been made to implement it, federated matching has been proven ineffective and most MDM Data Hub vendors discontinued its support.
This style of the MDM Data Hub architecture represents a Registry of unique master entity identifiers (created using identity attributes). It maintains only the identifying attributes. These attributes are used by an entity resolution service to identify which master entity records should be linked because they represent the same entity (i.e., customer, product, location, and so on). The Data Hub matches and links the records that share the same identity. The Data Hub creates and maintains links with data sources that were used to obtain the identity attributes. The MDM Data Hub exposes a service that returns a fully assembled holistic entity view to the consuming application either as retrieval or an assembly operation (for example, a customer, at run time). Using MDM for customer domain as an example, a Registry-style Data Hub should support the following features:
• Maintain some, at least matching customer profile attributes that it uses to generate a unique customer identifier. Such attributes may include customer name, address, date of birth, and externally assigned identifiers (social security number, an employer identification number, a business reference number such as a DUNS number, and so on).
• Automatically generate and maintain links with all upstream systems that maintain data about the customers. Consuming applications query the Registry for a given customer or a set of customers, and the Registry would use its customer identification number and legacy pointers or links and record merge rules to allow the application to retrieve and construct a view of the customer from the underlying data.
• Act as the “master” of the unique identifiers, and support arbitration of data conflicts by determining which attribute values in the source systems are better than others by applying attribute survivorship rules across multiple systems.
A limitation of this MDM architecture style is that it relies on the data available in the operational systems to assemble the best possible view of data. The Data Hub is not used for data entry and does not own the master data, but rather arbitrates the values that should be available in the operational source systems to be displayed by the Data Hub. If the data is not available in the source systems, the Registry-style Data Hub cannot create the right attribute values by itself. The records and correct attribute values have to be created and maintained in one of the feeding operational systems. Then the Data Hub will process the changes originated in the source system in real time and display an improved view of the benchmark record.
This MDM architecture style is a system of record for some entity attributes; it provides active synchronization between itself and the legacy systems.
• In this case, the Data Hub is the master for those data attributes that it actively maintains by supporting authoring of master data content. The Reconciliation Engine Data Hub style relies on the upstream source systems to maintain other data attributes. One implication of this approach is the fact that some applications that handle source or master data may have to be changed or redesigned based on the business processes, application interfaces, and the data they use. The same is true for the corresponding business processes. The other implication is that the Data Hub has to maintain, create, and change those data attributes for which it is the master. The Data Hub has to propagate changes for these attributes to the systems that use these attributes. The result is a data environment that continuously synchronizes the data content among its participants to avoid data inconsistencies.
• A shortcoming is that the complexity of synchronization increases as some of the data attributes maintained in the Data Hub are derived from the data attributes maintained in other systems. For example, a typical Reconciliation Engine–style Data Hub for customer domain has to create and maintain unique customer identifications as well as references to the legacy systems and data stores where the customer data is sourced from or continues to reside.
This architecture style is more sophisticated than the Registry-style Data Hub, and in many situations is a viable evolutionary step toward the full Transaction Hub.
This is the most sophisticated option, in which the Data Hub becomes the primary source of and the system of record for the entire master data domain, including appropriate reference pointers:
• This is the case where the Data Hub maintains practically all data attributes about the entity. For a given entity domain, such as a customer domain (individuals or businesses), the Data Hub becomes a “master” of the master entity information, and as such should be the source of all changes to any attribute about the master entity. In this case, the Data Hub has to be engineered as a complete transactional environment that maintains its data integrity and is the sole source of changes that it propagates to all downstream systems that use the customer data.
• The Transactional Hub has some profound implications for the overall environment, the existing applications, and business processes already in place. For example, an existing account maintenance application may have to undergo modifications to update the Data Hub instead of an existing legacy system, and appropriate synchronization mechanisms have to be in place to propagate and apply the changes from the Data Hub to some or all downstream systems. Moreover, most of the previously deployed transactions that change entity information should be redesigned to work directly with the Data Hub, which may also change existing business processes, workflows, and user navigation. This is the most complex case, which is known as a Full Transaction Hub.
• Practically speaking, the intrusiveness of the Transaction Hub style makes it a viable choice mostly in two scenarios:
• When dealing with a new enterprise that does not have a massive legacy infrastructure maintaining the master entity the Data Hub is supposed to resolve.
• When the current processes and applications already manage the master entity as a Transaction-style Data Hub. In this scenario, the new Data Hub is built to replace the existing master entity management system with a new system (for example, a customer-centric solution). For instance, it can be the case where the enterprise has already been using a home-grown Transaction-style MDM Data Hub and is looking to replace it with a more advanced vendor solution.
With the exception of the first, the External Reference style, these architecture and design styles have one thing in common – they define, create, and manage a centralized platform where master data is integrated either virtually (Registry) or physically (Reconciliation Engine and Transaction Hub) to create a reliable and sustainable system of record for master data.
MDM and Use Pattern Dimension
The Use Pattern classification dimension differentiates MDM architectures based on how the master data is used. We see three primary use patterns for MDM data usage: Analytical MDM, Operational MDM, and Collaborative MDM.
• Analytical MDM supports business processes and applications that use master data primarily to analyze business performance and provide appropriate reporting and analytical capabilities, often by directly interfacing with business intelligence (BI) tools and packages. Analytical MDM tends to be read-mostly, it usually does not change or create source data in the operational systems, but it does cleanse and enrich data in the MDM Data Hub. From the overall system architecture view, Analytical MDM can be architected as a feed into the data warehouse and can create or enrich an accurate, integrated view of the master data inside the data warehouse. BI tools are typically deployed to access this cleansed, enriched, and integrated data for reporting, performing deep analytics, and providing drill-through capabilities for the required level of detail.
• Operational MDM allows master data to be collected, changed, and used to process business transactions; Operational MDM is designed to maintain the semantic consistency of the master data affected by the transactional activity. Operational MDM provides a mechanism to improve the quality of the data in the operational systems, where the data is usually created. By design, Operational MDM systems ensure that the accurate, single version of the truth is maintained in the MDM Data Hub and propagated to the core systems used by existing and new processes and applications.
• Collaborative MDM allows its users to author master data objects and collaborate in the process of creation and maintenance of master data and its associated metadata.
These Use Pattern–based architecture viewpoints have common concerns and often use common or similar technologies, especially the components of technology related to data extraction, transformation, and load, as well as data quality.
At the same time, we can clearly see how the architectural implications of these three Use Pattern dimensions impact the way the MDM Hub has to handle data synchronization concerns, implement cross-application interoperability, deliver data changes to upstream and/or downstream systems, detect and improve data quality issues, and enable and support data governance processes.
Data Domain Dimension
The Information Scope or Data Domain dimension describes the primary data domain managed by the MDM solution. In the case of MDM for the customer data domain, the resulting solution is often called Customer Data Integration, or CDI. In the case of MDM for product data domain, the solution is known as Product Information Management, or PIM. Other data domains may not have formal acronym definitions yet, but could have an impact on how the MDM solution is designed and deployed. Primary architectural implications related to implementing customer, product, or other domains include:
• Design for entity resolution and identification. Techniques for these data domains can vary drastically based on the requirements for semantic consistency, speed, accuracy, and confidence.
• Ability to acquire and manage sources of external entity references, such as authoritative sources of individual names and addresses, business names, as well as identifiers and industry classifications (for example, D&B DUNS numbers).
• Information security and privacy concerns that apply differently to different data domains based on a particular risk profile of a given data domain within the context of business requirements as well as those governed by a variety of rules, policies, and governmental regulations.
Reference Data and Hierarchy Management
When we discuss the architectural implications of an MDM solution in the context of the data it manages, we need to recognize that the data scope alone does not address all variations of what data needs to be managed in what way. For example, most MDM implementations deal with creating a master environment of reference data, such as product reference, account reference, customer reference, and so on. However, it is not unusual for an organization to try to build an authoritative master data environment that supports enterprise-wide business attributes, such as customer revenues, expenses, risk exposure, and so on. Technically speaking, this is not traditional reference data, and the MDM Data Hub architecture should provide for features, functions, and services that can calculate, maintain, and ensure the quality of these key business metrics. Clearly, this adds an additional layer of complexity to an already complex system. This is where proven architecture patterns for creating such metrics can be inherited from existing business systems and “adopted” into the MDM Data Hub.
MDM and Hierarchy Management
Many business problems addressed by the MDM architecture include the management of data domain hierarchies. It is a common situation when an organization manages multiple views of the business based on a specific business focus, such as marketing view of customers, financial views of a global organization, various views of products, and so on. In these cases, we see an organizational hierarchy that consists of a parent (for example, legal entity) and multiple dependents (for example, accounts or other legal entities). Similarly, businesses tend to structure their sales organizations based on either products or geographies or cost centers. The challenge here is that these hierarchies are not static over time, and can and do change with business restructuring, mergers and acquisitions, new product introductions, and other events. Several formal definitions of hierarchies are available, but the following working definition of hierarchies is most relevant to general data management, and Master Data Management in particular.
In the context of MDM, we define a hierarchy as an arrangement of entities (parties, accounts, products, cost centers, and so on) where entities are viewed in relationship to each other as “parents,” “children,” or “siblings/peers” of other entities, thus forming a conceptual tree structure where all leaf nodes in the hierarchy tree can be rolled into a single “root.”
Further, the entities of a given domain can often support several hierarchical arrangements based on a particular classification schema (legal entity level, geography/ location, role/rank, scope of authority, and so on). A direct consequence of this fact is that changes in a classification schema or the introduction of another schema will result in the creation of a different hierarchy, sometimes referred to as an alternate hierarchy.
In order to create and maintain an authoritative, verifiable system of record, an MDM system has to be able to recognize and manage hierarchies based on the classification schemas; to compare, match, and link master entities that may exist at different levels of hierarchy; to manage the creation, maintenance, and versioning of different alternative hierarchies; and to provide relevant and timely changes in the hierarchies of reference data to the MDM users and consuming applications.
MDM Hierarchy Management and Data Warehousing
The discussion on hierarchy management of reference data offered in the preceding section is particularly relevant to the relationship between MDM and data warehousing. Let’s compare the principles of hierarchical structures with the concepts of facts and dimensions in the data warehousing discipline.11 Indeed, the notion of a hierarchy applies directly to the dimensions in a data warehouse’s data model, frequently referred to as a dimensional data model in the form of a “star” or “snowflake” schema, with the fact entities organized in a set of central tables that are “surrounded” by dimension tables, where the dimensional data contains attributes used as keys that point to the facts in the Fact Table.12 For example, a customer data warehouse may contain information about customer account values (facts) and dimensions such as customer identifiers, customer locations, and time. As the dimensional attributes change, the facts may change or new facts may get created. And in cases where dimensional values change infrequently, the data warehousing discipline recognizes the concept of Slow Change Dimensions, or SCD, the constructs that allow a data warehouse to maintain the historical view of the values of the facts (sometimes referred to as “time travel”).
Hierarchies and Data Dimensions
In the context of denormalized dimensional data models such as the star or snowflake schemas widely used in data warehousing, hierarchies are arrangements of records in the data model’s dimensions.
Data warehousing is a complex and mature technical discipline, and a detailed discussion of this topic is beyond the scope of this book. However, we briefly discuss the relationship between MDM hierarchy management and data warehousing concepts for the following reasons:
• As stated in preceding chapters, data warehousing is one of the predecessor technologies to MDM.
• In many instances, an MDM system is implemented “upstream” from data warehouses and data marts that are used to collect and aggregate master data and to provide reporting, analytical, and business intelligence capabilities to support an organization’s business and financial management needs.
Therefore, it is important to understand what MDM architecture features are required to support a large multidimensional data warehouse as a downstream system. Architecturally, these features are organized into a collection of hierarchy management services, and these services are used to maintain the integrity and accuracy of various hierarchies; to work in conjunction with entity resolution services to properly recognize, match, link, and aggregate entities in accordance with their hierarchical relationships; and to enable the efficient delivery of hierarchy changes to appropriate downstream consuming applications. Hierarchy management services and their uses are discussed in more detail in Chapters 5 and 6.
Note: The classification domains introduced in this chapter have clear implications on MDM architecture. Specifically, although MDM architecture styles defined by these various viewpoints are different, they have many things in common. In reality, it is not unusual to find an MDM implementation that exhibits properties of one or more architecture styles at the same time – for example, acting as a Registry for some master data domain while being a coexistence-style MDM Data Hub for others. Likewise, aside from some very specific capabilities and implementation patterns, the architecture of an MDM Data Hub for a customer domain is significantly similar to that of the product domain, and so on. The latter is one of the enablers of evolving MDM from a single-domain master data management solution to a multidomain Data Hub operating on the same technology platform.
The relevance of this note is in that it points to the significant flexibility and versatility of the MDM architecture. It also confirms our previous discussion on the value of the architecture frameworks and architecture viewpoints that provide different insights into the same large and complex system.
Reference Architecture Viewpoint
In the previous sections we looked at the key components and architecture viewpoints of the MDM architecture, and showed its complexity and the variety of approaches you could take to select, build, and implement an MDM solution.
However, this discussion would not be complete if we didn’t consider another key architectural artifact – a reference architecture viewpoint. Reference architecture is one of the best-known complexity-reducing architecture viewpoints. Let’s informally define reference architecture as follows:
Reference architecture is a high-level abstraction of a technical solution to a particular problem domain; it is a set of interlinked components, services, processes, and interfaces organized into functional layers, where each layer provides services to the layers above and consumes services from the layers below. As such, reference architecture does not define specific technologies or implementation details.
The key value proposition of reference architecture is in its ability to help architects and designers to define the functionality and placement of all architecture components in the context of the overall system and problem domain. In other words, reference architecture provides a blueprint and helps create a set of patterns for designing specific solution/system components and their interactions. That is why a reference architecture viewpoint is such a powerful tool for designing systems of MDM-level complexity and interdependencies.
Using this definition of the reference architecture, we can define an MDM reference architecture viewpoint as an industry-and data domain–agnostic architectural multilayered abstraction that consists of services, components, processes, and interfaces (see Figure 4-6).
As an instance of an SOA, this MDM reference architecture contains a significant number of key service components. Some of these services are discussed in further detail in Chapters 5 and 6 of the book, but we offer a brief list of higher-level service layers in this section for the purpose of completeness.
• The Data Management layer includes:
• Interface services, which expose a published and consistent entry point to request MDM services.
• Entity resolution and lifecycle management services, which enable entity recognition by resolving various levels of identities, and manage life stages of master data by supporting business interactions including traditional Create, Read, Update and Delete (CRUD) activities.
• Search services, for easy access to the information managed by the MDM Data Hub.
• Authoring services, which allow MDM users to create, author, manage, customize/change, and approve definitions of master data (metadata), including hierarchies and entity groups. In addition, Authoring services enable users to manage CRUD-specific instances of master data.
• The metadata management service, which provides support for data management aspects of metadata creation, manipulation, and maintenance. The metadata management service supports a metadata repository and relies on and supports internal Data Hub services such as attribute and record locator services and even key generation services.
• Hierarchy, relationships, and groupings management services, which deliver functions designed to manage master data hierarchies, groupings, and relationships. These can process requests from the authoring services.
• Enrichment and sustaining services, which are focused on acquiring and maintaining the correct content of master data, controlled by external data references and user-driven adjustments.
• The Data Rules layer includes key services that are driven by business-defined rules for entity resolution, aggregation, synchronization, visibility and privacy, and transformation.
• The Data Quality layer includes services that are designed to validate and enforce data quality rules, resolve entity identification and hierarchical attributes, and perform data standardization, reconciliation, and lineage. These services also generate and manage global unique identifiers as well as provide data quality profiling and reporting.
• The System Services layer includes a broad category of base services such as security, data visibility, event management (these are designed to react to predefined events detected within the master data by triggering appropriate actions), service management (orchestration, choreography), transaction and state management, system synchronization, and intersystem connectivity/data integration services, including Enterprise Information Integration services for federated data access (discussed in more detail in Chapters 5 and 6).
Despite this long list of services defined in the MDM reference architecture viewpoint, at a high level this reference architecture appears to be deceptively simple. However, a closer look will reveal that most of the components and services of the architecture have to be present in order to accomplish the goal of creating an MDM system. Moreover, many of these components are complex objects that, in turn, contain many lower-level components and services. We will offer a more detailed discussion of some of the components in the subsequent chapters of the book. To set the stage for the detailed discussion, we will organize the components, services, and layers of this high-level conceptual reference architecture into two major groups: traditional architecture concerns of information management and new, advanced concerns driven by the goals of Master Data Management.
The traditional architecture concerns focus on the area of data and data management. These concerns include data architecture and data modeling; data extractions, transformation, and loading; metadata repository and metadata management; database management system performance and scalability; transaction management; backup and recovery; and others (see Figure 4-7).
Advanced MDM-specific concerns include areas such as identity recognition, matching and generation of global unique entity identifiers, persistence of entity identification, rules-based and data content–based synchronization to/from legacy, reconciliation and arbitration of data changes, data security and data visibility, service implementation and management integration with legacy environments, and many others (see Figure 4-8).
We discuss these traditional and advanced concerns of the MDM architecture in more detail in the remaining chapters of his part of the book and also in Part III. The material in these chapters offers additional insights and architecture viewpoints that should help MDM managers, designers, and implementers to achieve measurable results using a structured and disciplined architecture approach.
More about this book and others like it...