In this excerpt from Master Data Management and Data Governance, readers will learn about the evolution of master...
data management (MDM) architecture and gain insight on how MDM architecture has changed over the years. Readers will also learn about MDM architectural considerations and find an architectural definition of MDM.
Table of Contents
The evolution of MDM architecture
An introduction to enterprise architecture framework and MDM patterns
MDM and SOA: An introduction to SOA and the benefits of SOA
MDM design, MDM deployment options and MDM hierarchy
In the introductory part of this book, we offered a broad-brush description of the purpose, drivers, and key benefits of Master Data Management and used some specific examples of its customer-focused variant, Customer Data Integration. This part of the book discusses the issues of MDM architecture as a key logical step to building enterprise-wide solutions.
An architecture discussion is important for several reasons:
- A comprehensive end-to-end MDM solution is much more than just a database of customer or product information organized by some kind of a unique key. Some MDM capabilities and components are “traditional” and are a part of a common best-practice design for integrated data solutions, whereas other, new features came to light primarily in the context of MDM problem domains. An architectural vision can help organize the “old” and the “new” features into an integrated, scalable, and manageable solution.
- MDM is not just a technology problem – a comprehensive MDM solution consists of technology components and services as well as new business processes and even organizational structures and dynamics. There are many architecture viewpoints, significant complexity, and a large number of interdependencies to warrant a framework-based approach to the architecture. This multifaceted, multidimensional architecture framework looks at the overall problem domain from different but complementary angles.
- Any solution intended to create an authoritative, accurate, and timely system of record that should eventually replace existing legacy sources of the information must be integrated with the overall enterprise architecture and infrastructure. Given the heterogeneity and the “age” of legacy systems, this requirement is often difficult to satisfy without a comprehensive architecture blueprint.
Thus, we organized this part of the book in the following fashion: First, we discuss the architectural genesis of MDM. Then, we take a closer look at the enterprise architecture framework and explain how this framework helps us to see different aspects of the solution as interconnected and interdependent views. This discussion is followed by an overview of traditional data management and emerging concerns of MDM architecture, MDM data modeling, data management architecture, and the newer concept of MDM services.
MDM Architecture Classification, Concepts, Principles, and Components
In order to understand “how” to build a comprehensive Master Data Management solution, we need to define the “what” of Master Data Management.
We have already offered high-level definitions of MDM and its customer-focused variant, CDI, in Part I of this book. We also stated that CDI and other MDM variants share many architecture principles and approaches; therefore, in this part of the book we concentrate on common architecture aspects of Master Data Management. Where appropriate, we’ll mention specific architecture features of key MDM variants – in particular, Customer Data Integration and Product Information Master.
Architectural Definition of Master Data Management
As shown in previous chapters, the scope of Master Data Management by its very nature is extremely broad and applies equally well to customer-centric, product-centric, and reference data–centric business problems, to name just a few. A common thread among the solutions to these problems is the ability to create and maintain an accurate, timely, and authoritative “system of record” for a given subject domain. Clearly, such a definition can be refined further for each situation and problem domain addressed by Master Data Management.
Let’s start with a fresh look at the definitions of master data and Master Data Management offered in Chapter 1:
- Master data is composed of those entities, relationships, and attributes that are critical for an enterprise and foundational to key business processes and application systems.
- Master Data Management (MDM) is the framework of processes and technologies aimed at creating and maintaining an authoritative, reliable, sustainable, accurate, and secure data environment that represents a “single and holistic version of the truth,” for master data and its relationships, as well as an accepted benchmark used within an enterprise as well as across enterprises and spanning a diverse set of application systems, lines of business, channels, and user communities. To state it slightly differently, an MDM solution takes the master data of a given domain from a variety of data sources’ discards redundant data; and then cleanses, rationalizes, enriches, and aggregates it to the extent possible. We can illustrate such an MDM environment as a “hub and spokes,” where the spokes are information sources connected to the central hub as a new “home” for the accurate, aggregated, and timely master data (see Figure 4-1). This description helps explain why we often use the term “Data Hub” when discussing an MDM solution space.
Interestingly, using this definition of “what” MDM is does not make our goal of creating architecture much easier to achieve. Indeed, this definition points to the fact that, for example, a CDI solution is much more than just a database of customer information, a solution known by many as a Customer Information File (CIF), a data warehouse of customer information, or an operational data store (ODS). In fact, this definition describes an enterprise-scale system that consists of software components, services, processes, data models and data stores, metadata repositories, applications, networks, and other infrastructure components.
Thus, in order to develop a clear understanding of the “how” of the MDM solution, we will review the historical roots of Master Data Management and its evolution from early attempts to deliver on the MDM promise to what it has become today.
Evolution of Master Data Management Architecture
As we discussed in Chapter 1, the need to create and maintain an accurate and timely “information system of record” is not new, and it applies equally well to businesses and government entities. Lately, a number of regulatory requirements, including the Sarbanes-Oxley Act, the Basel II Capital Accord, and the emerging Basel III Accord (see the discussion on these regulations in Part III of the book), have emphasized this need even further.
In the case of Customer Data Integration, organizations have been engaged in creating customer-centric business models and applications and enabling infrastructure for a long time. However, as the business complexity, number and type of customers (retail customers, individuals, institutional customers, and so on), number of lines of business, and number of sales and service channels continued to grow, this growth often proceeded in a tactical, nonintegrated fashion. As result, many organizations ended up with a wide variety of customer information stores and applications that manage customer data. As an example, one medium-sized service/distribution company maintained no less than eight customer databases that had to be rationalized and cleansed in order to achieve targeted goals for efficiency and quality of the customer service.
The customer data in that “legacy” environment was often incomplete and inconsistent across various data stores, applications, and lines of business. In many other cases, individual applications and lines of business were reasonably satisfied with the quality and scope of customer data they managed. However, the lack of completeness and accuracy and the lack of consistency across lines of business continued to prevent organizations from creating a complete and accurate view of customers and their relationships with the servicing organization and its partners.
Similarly, product information is often scattered across multiple systems. Products and services are modeled in product design and analysis systems where product functionality, bills of materials, packaging, pricing, and other characteristics are developed. Once the product modeling is complete, product information along with product-specific characteristics are released for cross-functional enterprise use.
Note: In the scope of MDM for customer domain, we often discuss business transformation to achieve customer centricity as a major goal and benefit of MDM. However, given the domain-agnostic nature of MDM, it is more accurate to talk about transforming the enterprise from an account-centric to an entity-centric model, and, where possible, we’ll be using the term “entity centricity” when discussing this transformational feature of MDM.
Recognizing the entity-centricity (e.g., customer, product) challenge and the resulting inability to transform the business from an account-centric to an entity-centric model, organizations first developed a variety of solutions that attempted to help move the organizations into the new entity-centric world. Although in general these solutions added some incremental value, many of them were deployed in the constraints of the existing lines of business, and very few were built with a true enterprise-wide focus in mind. Nevertheless, these solutions and attempts to achieve entity centricity have helped define MDM in general and CDI and PIM in particular to become a real enabler of such business model transformations. Therefore, we need to understand what has been done prior to the emergence of MDM, and what, if any, portions of the existing solutions can and should be leveraged in implementing MDM. The good news is that many of these solutions are not data-domain specific and can be viewed as foundational technologies for MDM in general.
These solutions include but are not limited to Customer Information File (CIF); Extract, Transform, and Load technologies (ETL); Enterprise Data Warehouse (EDW); an operational data store (ODS); data quality (DQ) technologies; Enterprise Information Integration (EII); Customer Relationship Management (CRM) systems; and Product Master environments, to name just a few. Although some of these solutions and technologies were discussed briefly in Chapter 1, we want to offer a slightly deeper and more architecture-focused review of them, with a view toward their suitability to act as components of a Master Data Management platform.
- Customer Information File (CIF). Many companies have established LOB-specific or company-wide customer information file environments. Historically, CIF solutions used older file management or database management systems (DBMS) technology and represented some very basic point-in-time (static) information about the customers. In other words, CIFs offer limited flexibility and extensibility and are not well suited to capturing and maintaining real-time customer data, customer privacy preferences, customer behavior traits, and customer relationships. Moreover, traditional CIF does not support new complex business processes, event management, and data element–level security constraints known as “data visibility” (see Part III for a detailed discussion on this topic). Shortcomings like these prevent traditional CIF environments from becoming a cross-LOB integration vehicle of customer data. Although CIF systems do not deliver a “single version of the truth” about the customer, in most cases existing CIF systems are used to feed the company’s Customer Relationship Management systems. Moving forward, a CIF can and should be treated as a key source data file that feeds a new Master Data Management Customer Data Hub system.
- Extract, Transform, and Load (ETL). These tools are typically classified as data integration tools and are used to extract data from multiple data sources, transform the data to a required target structure, and load the data into the target data store. A key functionality required from the ETL tool is its ability to perform complex transformations from source formats to the target; these transformations may include Boolean expressions, calculations, substitutions, reference table lookup, support for business rules for aggregation and consolidation, and many other features. Contemporary ETL tools include components that perform data consistency and data quality analysis as well as the ability to generate and use metadata definitions for data attributes and entities. Many tools can create output data in XML format according to the predefined schema. Finally, the enterprise-class ETL tools are designed for high scalability and performance and can parallelize most of their operations to achieve acceptable throughput and processing times when dealing with very large data sets or complex transformations.
Although many ETL processes run in batch mode, best-in-class ETL tools can support near-real-time transformations and load functionality. Given that description, it is quite clear that an ETL component can and should be used to transform and load data into an MDM platform – Data Hub – both for the initial load and possibly for the incremental data updates that keep the Data Hub in sync with existing sources. We discuss MDM data synchronization approaches using ETL in Chapter 16 of the book.
- Enterprise Data Warehouse (EDW). Strictly speaking, a data warehouse is an information system that provides its users with current and historical decision-support information that is hard to access or present using traditional operational data stores. An enterprise-wide data warehouse of customer information can become an integration vehicle where most of the customer data can be stored. Likewise, an enterprise data warehouse of product information can act as an integration point for many product-related transactions. Typically, EDW solutions support business intelligence (BI) applications and, in the case of customer domain, Customer Relationship Management (CRM) systems. EDW’s design, technology platform, and data schema are optimized to support the efficient storage of large amounts of data and the processing of complex queries against a large number of interconnected data tables that include current and historical information. Traditionally, companies use EDW systems as informational environments rather than operational systems that process real-time, transactional data.
Because EDW cleanses and rationalizes the data it manages in order to satisfy the needs of the consuming BI and CRM systems, an EDW becomes a good platform from which data should be loaded into the Data Hub.
- Operational data store (ODS). This technology allows transaction-level detail data records to be stored in a nonsummarized, query accessible, and long-lasting form. An ODS supports transaction-level analysis and other applications that deal with the low level of details. An ODS differs from a data warehouse in that it does not maintain summarized data, nor does it manage historical information. An ODS allows users to aggregate transaction-level data into higher-level attributes but does not support a drill-down into the underlying detail records. An ODS is frequently used in conjunction with the Enterprise Data Warehouse to provide the company with both historical and transactional real-time data.
Similar to the EDW, an ODS that contains customer or product data can and should be considered a valuable source of information for constructing an MDM solution.
- Data quality (DQ) technologies. From the point of view of a business value proposition, the focus of data quality technologies and tools is to help all applications to produce meaningful and reliable results. These tools are especially important for delivering accurate business intelligence and decision support as well as improving customer retention, sales and customer service, customer experience, risk management, compliance, and fraud detection. Companies use data quality technologies to profile data, to report anomalies, and to standardize and “fix” data in order to correct data inconsistencies and known data quality issues, such as missing or invalid data.
Although data quality tools are especially effective when dealing with the name and address attributes of customer data records, they are also very useful for managing data quality in other data domains. Thus, data quality tools and technologies are key components of most Master Data Management solutions.
- Enterprise Information Integration (EII). Enterprise Information Integration tools are frequently used to aggregate subsets of distributed data in memory or non-persistent storage, usually in real time. Companies use EII solutions to perform search queries across distributed databases and aggregate the results of the queries at the application or presentation layer. Contrast that with the data integration solutions that aggregate and persist the information at the back end (that is, in a data warehouse or an MDM Data Hub). An EII engine queries a distributed database environment and delivers a virtualized aggregated data view that appears as if it came from a single source. EII engines are also used often in a service-oriented architecture (SOA) implementation as the data access and abstraction components (we discuss SOA later in this chapter).
Some MDM implementations use EII technologies to provide users with a virtualized total view of a master data without creating a persistent physical image of the aggregation, thus providing additional data model flexibility for the target Data Hub.
- Customer Relationship Management (CRM). Customer Relationship Management uses a set of technologies and business processes designed to help the company understand the customer, improve customer experience, and optimize customer facing business processes across marketing, sales, and servicing channels. From the architecture perspective, CRM systems often act as consumers of customer data and are some of the primary beneficiaries of the MDM Data Hubs.
- Product Master. Manufacturing companies manage a variety of complex products and product hierarchies. Complex products consist of multiple parts, and those parts contain lower-level components, materials, or parts. This hierarchy represents what is often called a “Bill of Materials” (BOM). BOM management software helps centralize and control complex BOM processes, reduce error rates, and improve control over operational processes and costs.
An MDM system that is integrated with BOM management software can significantly enhance an integrated multidomain view of the master data. For example, a product characterized by BOM components can be integrated with suppliers’ component data.
More about this book and others like it...
- Intrigued by this chapter excerpt? Download a free PDF of the entire chapter: MDM Architectural Considerations
- Read more excerpts and download more sample chapters from our Data Management bookshelf
- To purchase the book or similar titles, visit the McGraw-Hill website.