Home > Data Management All-in-One Guides > Data management books: Chapter download library > Enterprise data integration > What is a master data management system?
All-in-One Guides: Data management books: Chapter download library:
EMAIL THIS LICENSING & REPRINTS
 START   BUSINESS INTELLIGENCE   ENTERPRISE DATA INTEGRATION   DATA QUALITY / GOVERNANCE   DATA WAREHOUSING / DBMS   SECURITY / COMPLIANCE   
Enterprise data integration

<< PREVIOUS | NEXT >>: Managing unstructured data in the organization

What is a master data management system?

28 Jul 2008 | Allen Dreibelbis, Eberhard Hechler, Ivan Milman, Martin Oberhofer, Paul van Run and Dan Wolfson

Tips, expert advice and sample chapters
Digg This!    StumbleUpon Toolbar StumbleUpon    Bookmark with Delicious Del.icio.us    Add to Google

Master data management and service-oriented architecture

The following is an excerpt from Enterprise Master Data Management: An SOA Approach to Managing Core Information, by Allen Dreibelbis, Eberhard Hechler, Ivan Milman, Martin Oberhofer, Paul van Run and Dan Wolfson. It is reprinted here with permission from IBM Press; Copyright 2008. Read the chapter below to learn about master data management (MDM) systems, or download a free .pdf of "What is a master data management system?" to read later.

Chapter 1 is split into four parts. This is Part 3.

Table of contents

What is a master data management system?

Master Data Management Systems provide authoritative data to an organization. But what kind of data? How do we work with the MDM System? How do we integrate the MDM System with the existing systems? These questions describe a solution space within which there are a wide variety of ways in which MDM Systems can be deployed and used according to the needs and goals of the enterprise.

In this section, we describe the three primary dimensions of this MDM solutions space. As shown in Figure 1.6, the three dimensions are the domains of master data that are managed, the methods by which the system is to be used, and the styles of implementation that are needed for a particular deployment. It is important to note that MDM implementations are typically not deployed in a "big bang" approach where all domains are managed across all methods of use. Organizations generally start with a limited scope that provides the highest return on investment in a relatively short time frame. As MDM implementations are rolled out over several phases, the space of the implementation may grow. Additional domains are added, the method of use may expand, or the implementation style may change to deliver additional business value. The term Multiform MDM is sometimes used to describe MDM Systems that support these three dimensions of MDM Systems. The following sections describe these dimensions in greater detail.


Figure 1.6 Dimensions of Master Data Management.

1.3.1 Master Data Domains

Master Data Management has emerged over the last few years from the recognition that the existing markets of Customer Data Integration (CDI) and Product Information Management (PIM) had key similarities as well as differences. CDI focuses on managing people and organizations— which we will collectively call parties. A CDI system can aggregate party information from many preexisting systems, manage the use of the party data, and distribute the information out to downstream systems such as billing systems, campaign management systems, or CRM systems.

PIM systems manage the definition and lifecycle of a finished good or service—collecting product information from multiple sources, getting agreement on the definition of products, and then publishing this information to Web sites, marketing systems, merchandizing systems, and so on. PIM systems are distinct from Product Lifecycle Management (PLM) systems, which focus on the design and development of products rather than the preparation of product information to support sales and distribution. There is a natural flow of information from a PLM system to a PIM system as a product transitions from engineering into marketing and sales.
More about master data management systems
Master data management (MDM) tutorial

Three master data management trends

MDM for the Enterprise podcast series

CDI and PIM both represent a common pattern—that of aggregating data from existing systems, cleaning and augmenting that data, and then distributing that data to downstream systems. PIM and CDI systems differ in the most common ways in which the data is used after it has been loaded into the MDM System—we discuss the different methods of use in the following section. It is important to note that MDM Systems do more than just store and retrieve data—they incorporate business logic to reflect the proper management and handling of the master data. The rules for handling a product lifecycle are different than those for managing the lifecycle for a customer. The MDM System may also be configured to issue alerts when interesting things happen. For example, billing systems may need to get notified immediately when a customer address changes. This business logic can be customized for a particular deployment to reflect the needs of a particular industry as well as the unique characteristics of the implementing organization.

As CDI and PIM products have matured, it was also observed that while CDI systems focused on the customer, it was often convenient for such systems to include references to the products or accounts that a customer has. Similarly, PIM systems often need to store or reference the suppliers of the products or services. Supporting and using these cross-domain relationships has become a significant aspect of MDM Systems.

The kinds of information treated as master data varies from industry to industry and from one organization to another. An insurance company may wish to treat information about customers, policies, and accounts as master data, while a telecommunications company may be concerned with customers, accounts, location (of cell phone towers), and services. A manufacturer may be focused on managing suppliers, customers, distributors, and products. A government agency may want to focus only on citizens and non-citizens. In these examples, we see a lot of commonality as well as differences. In general, master data can be categorized according to the kinds of questions they address; three of the most common questions—"Who?," "What?," and "How?" are addressed by the party, product, and account domains of master data. Each of these domains represents a class of things—for example, the party domain can represent any kind of person or organization, including customers, suppliers, employees, citizens, distributors, and organizations. Each of these kinds of party shares a common set of attributes—such as the name of the party, where it is located (a party may have multiple locations such as home, work, vacation home, etc.), how to contact it, what kind of relationship the organization has with the party, and so forth. Similarly, the product domain can represent all kinds of things that you sell or use—from tangible consumer goods to service products such as mortgages, telephone services, or insurance policies. The account domain describes how a party is related to a product or service that the organization offers. What are the relations of the parties to this account, and who owns the account? Which accounts are used for which products? What are the terms and conditions associated with the products and the accounts? And how are products bundled?

Location information is often associated with one of the other domains. When we talk about where a product is sold, where a customer lives, and the address at which an insurance policy is in effect, we are referring to location information. Location information is tied to a product, a party, or an account—it does not have an independent existence. There are, of course, cases where location does exist independently, but those situations seem to be less common. Another interesting facet of location is that it can be described in many different ways (by postal address, by latitude and longitude, by geopolitical boundaries)—we need a particular context in order to define what we mean. A location can be a sales territory, a city, a campus with many buildings, a store, or even a spot on a shelf in an aisle within a store. For these reasons, we will treat location as a subordinate domain of master data.

Figure 1.7 shows how the three primary domains of party, product, and account overlap. These areas of overlap are particularly interesting, because they indicate fundamental relationships between the domains. For example, when we define a product, we often need to specify the party that supplies that product and the location(s) in which the product may be sold. Explicitly capturing these relationships within the same environment allows us to address business questions that may be otherwise difficult to resolve. Building on the previous example, if we record the party that supplies a product as well as the parties that we sell products to, then we can determine which of our suppliers are also our customers. Understanding the full set of linkages that an organization has with a partner can be valuable in all aspects of working with that partner—from establishing mutually beneficial agreements to ensuring an appropriate level of support. Indeed, perhaps the key benefit of supporting multiple domains of master data within the same system is that it clarifies these cross-domain relationships.


Figure 1.7 Domains of Master Data.

Master data domains can be made specific to a particular industry through the application of industry standards or widely accepted industry models.3 Typically, standards and models can be used to drive not just the definition of the data model within an MDM Solution but the services that work with the master data as well. In particular, use of standards and models aligns the services exposed by an MDM Solution with accepted industry-specific definitions, which reduces the cost of integration.

Gaining agreement on the definition of an MDM domain can be challenging when different stakeholders within an organization have different requirements or look at the same requirements from different points of view. If well-accepted industry models or standards exist, they can serve as a foundation for further customization, eliminating the need to laboriously gain agreement on every term or service definition. Table 1.1 provides a list of some of the standards and models that are available within a range of industries. Some of these standards and models could be used to guide the definition of data structures and access services for MDM domains.

In summary, an MDM System supports one or more domains of master data. The domains provided are often industry-neutral but can be subsequently tailored (and/or mapped) to different industry standards or models. The domain definitions can be further customized during the design and implementation of an MDM Solution for a specific environment.

1.3.2 Methods of Use

As we look at the roles that master data plays within an organization, we find three key methods or patterns of use: Collaborative Authoring, Operational, and Analytical, shown in Figure 1.8. The simplest way to think about these methods of use is to consider who will be the primary consumers of the master data. Under the Collaborative Authoring pattern, the MDM System coordinates a group of users and systems in order to reach agreement on a set of master data. Under the Operational pattern, the MDM System participates in the operational transactions and business processes of the enterprise, interacting with other application systems and people. Finally, under the Analytical pattern, the MDM System is a source of authoritative information for downstream analytical systems, and sometimes is a source of insight itself.

Table 1.1 Some Industry Standards and Models

StandardIndustry or Industry ModelWeb resource
BankingIBM Information FrameWork (IFW)www-306.ibm.com/software/data/
ips/products/industrymodels/
Interactive Financial eXchange (IFX)www.ifxforum.org
InsuranceIBM Insurance Application Architecture (IAA)www-306.ibm.com/software/data/
ips/products/industrymodels/
Association for Cooperative Operations Research and Development (ACORD)www.acord.org
TelecomsShared Information/Data Model (SID)www.tmforum.org
IBM Telecommunications Data Warehousehttp://www-306.ibm.com/software/data/
ips/products/industrymodels/telecomm.html
RetailAssociation for Retail Technology Standards (ARTS)www.nrf-arts.org
IBM Retail Data Warehousehttp://www-306.ibm.com/software/data/
ips/products/industrymodels/retail.html
HealthcareHealth Level 7 (HL7)www.hl7.org

A particular element of master data such as a product or an account may be initially authored using a collaborative style, managed operationally through the operational style, and then published to other operational and analytical systems. Because MDM Systems may be optimized to one or more of the methods of use, more than one MDM System may be needed to support the full breadth of usage. Where multiple MDM Systems are used to support multiple usage patterns, careful attention to the integration, management, and governance of the combined system is required to ensure that the master data of the combined system is consistent and authoritative.


Figure 1.8 Multiple MDM domains and multiple methods of use.

It is important to note that the style of usage is completely independent from the domain of information managed. Although Product Information Management systems are often associated with a Collaborative Authoring style of use, and Customer Data Integration systems are often associated with an Operational usage style, this alignment is not necessary or exclusive. There are an increasing number of cases where organizations seek an operational usage of product information as well as a range of use cases for collaborative authoring of customer information.

1.3.2.1 Collaborative MDM

Collaborative MDM deals with the processes supporting collaborative authoring of master data, including the creation, definition, augmentation, and approval of master data. Collaborative MDM is about achieving agreement on a complex topic among a group of people. The process of getting to agreement is often encapsulated in a workflow that may incorporate both automated and manual tasks, both of which are supported by collaborative capabilities. Information about the master data being processed is passed from task to task within the workflow and is governed throughout its lifecycle.

As a consequence of the complexity of product development and management, PIM systems commonly support a collaborative style of usage. Perhaps the most common process implemented by PIM systems is the process for introducing a new product to the market. An in-depth discussion on NPI can be found in Chapter 6. A typical NPI process is shown in Figure 1.9.


Figure 1.9 Simplified New Product Introduction process.

Here we can see that information about new products (or items) is received from one or more external sources and then incrementally extended, augmented, validated, and approved by a number of different end users with different user roles and responsibilities.

The collaborative steps within a New Product Introduction process are used to define the kinds of properties that describe the product. A given product will be described by dozens, and often hundreds, of properties depending on how the product is classified and where it is sold. In the New Product Introduction process, product specialists, buyers, and other stakeholders describe all of the characteristics of the product that are necessary to bring it to market. These characteristics may include product specifications, marketing information, ingredients, safety information, recycling information, cost, and so on. Large retailers may have more than a million products that they sell, spanning categories from food to clothing to furniture to appliances. The kinds of properties that are relevant to a product depend on the kind of product it is. For clothing, examples include color, size, and material; for electronic appliances, examples might be specifications, color, warranty, and so on. The Collaborative MDM System helps users to capture all of the different relevant properties of the product, validate the properties, categorize the product, and coordinate the approval of the product. As buyers and product specialists come up with new ways to describe products, new properties are created to hold these new descriptions. In retail environments, the structure of the product information is constantly evolving.

Collaboration is a common pattern and can be found beyond the PIM domain. Indeed, we find that many of the tasks performed by a product specialist in the PIM environment are also performed in the management of Customer and Account information. A key role that spans all domains of master data is that of data steward. A data steward looks after the quality and management of the data. For example, when we believe that two or more party records in a data store may really refer to the same individual, data stewards may need to manually combine information from the party records together and then validate the proposed changes with supervisors. Similarly, where questions requiring human intervention arise about the accuracy of information, a request for attention may be made visible to all data stewards who are capable of handling the issue, which can result in a collaborative pattern to resolve data quality issues.

The Collaborative style of usage requires a core set of capabilities within the MDM environment. A combination of workflow, task management, and state management are needed to guide and coordinate the collaborative tasks and the master data being collaborating on. Workflow controls the execution of a sequence of tasks by people and automated processes. Task management prioritizes and displays pending work for individuals to perform, while state management helps us to model and then enforce the lifecycle of the master data.

Because many concurrent users and workflows may be executing in parallel, the integrity of the master data needs to be protected with a check-in/check-out or similar locking technique. To improve efficiency, master data records are often processed in batches within the same workflow, which results in the concept of a "workbasket" of master data records that is passed from task to task within the workflow. Tasks within a workflow may be automated actions (such as import, export, or data validation) or manual tasks that allow users to work directly with the master data. Typically, this workflow will involve business users and data stewards, a process that, in turn, has implications for the design of the UIs (user interfaces) for collaborative authoring of master data. User interfaces must be both efficient and comfortable to use, and must rely on a set of underlying services that create, query, update, and delete the master data itself, the relationships between the master data, and other related information, such as lookup tables. Tooling to support the flexible creation and customization of collaborative workflows and even user screens may also be provided.

Finally, a common set of services are typically also provided to enforce security and privacy, and to support administration, validation, and import/export of master data. These services are needed across all kinds of MDM Systems.

1.3.2.2 Operational MDM

In the Operational style of MDM, the MDM server acts as an Online-Transaction Processing (OLTP) system that responds to requests from multiple applications and users. Operational MDM focuses on providing stateless services in a high-performance environment. These stateless services can be invoked from an enterprise business process or directly from a business application or user interface. Operational MDM services are often designed to fit within a Service-Oriented Architecture as well as in traditional environments. Integration of an Operational MDM System with existing systems calls for the support of a wide variety of communications styles and protocols, including synchronous and asynchronous styles, global transactions, and one-way communications.


Figure 1.10 Example New Account Opening process.

A good example of Operational MDM usage is a New Account Opening business process. In this process, a person or organization wants to open a new account—perhaps a bank account, a cable TV account, or any other kind of account. As shown in Figure 1.10, MDM services are invoked to check what information about the customer is already known and to determine if product policy is being complied with before an offer of a new account is made. If the customer isn't already known, then the new customer is added to the MDM System and a new account is created (presuming that the new customer meets the appropriate requirements). Each of the tasks within this workflow is implemented by a service, and many of these services are implemented by an Operational MDM System.

Operational MDM is also commonly used in the PIM domain. For retailers, after products have been defined, the approved product information may be published to an operational MDM System that then serves as a hub of MDM information that interacts with merchandising, distribution, or e-commerce applications. As such applications become more open and able to interact within an SOA environment, the need for such an operational MDM hub increases.

A wide range of capabilities is required for the Operational usage style. There can be hundreds of services that provide access and management of MDM data. Specific sets of services for each kind of MDM object managed provide for creation, reading, updating, and deletion of the MDM objects. Services are also provided to relate, group, and organize MDM objects. As with the Collaborative style of MDM, services are also needed for cleansing and validation of the data, for detection and processing of duplicates, and for managing the security and privacy of the information.

1.3.2.3 Analytical MDM

Analytical MDM is about the intersection between Business Intelligence (BI) and MDM. BI is a broad field that includes business reporting, data warehouses, data marts, data mining, scoring, and many other fields. To be useful, all forms of BI require meaningful, trusted data. Increasingly, analytical systems are also transitioning from purely decision support to more operational involvement. As BI systems have begun to take on this broader role, the relationship between MDM Systems and Analytical systems has also begun to change.

There are three primary intersections between MDM and BI.

  • MDM as a trusted data source: A key role of an MDM System is to be a provider of clean and consistent data to BI systems.
  • Analytics on MDM data: MDM Systems themselves may integrate reporting and analytics in support of providing insight over the data managed within the MDM System.
  • Analytics as a key function of an MDM System: Specialized kinds of analytics, such as identity resolution, may be a key feature of some MDM Systems.

One of the common drivers for clean and consistent master data is the need to improve the quality of decision making. Using an MDM System to feed downstream BI systems is an important and common pattern. The data that drives a BI system must be of a high quality if the results of the analytical processing are to be trusted. For this reason, MDM Systems are often a key source of information to data warehouses, data marts, Online Analytical Processing (OLAP) cubes, and other BI structures. The common data models for data warehouses use what are called star schemas or snowflake schemas to represent the relationship between the facts to be analyzed and the dimensions by which the analysis is done. For example, a business analyst in a retail environment would be interested in understanding the number or value of sales by product or perhaps by manufacturer. Here, the sales transaction data is stored in fact tables. Product and manufacturer represent dimensions of the analysis. We can observe that often master data domains align with dimensions within an analytical environment, which makes the MDM System a natural source of data for BI systems.

The insight gained from a data warehouse or OLAP cube may also be fed back into the MDM System. For example, in the travel and entertainment industry, some companies build analytical models that can project the likely net lifetime revenue potential of a customer. To build these projections, they will source the master data from an MDM System and transactional details from other systems. After the revenue potential is computed, the MDM System is updated to reflect this information, which may now be used as part of each customer's profile. Reservation systems can then use this profile to tailor offers specifically to each customer.

Insight may also be derived from data maintained by the MDM System itself. An MDM System contains all of the information needed to report on key performance indicators such as the number of new customers per week, the number of new accounts per day, or the average time to introduce a new product. Reporting and dashboarding tools can operate directly over the master data to provide these kinds of domain-specific insights. Some MDM Systems also incorporate a combination of rules and event subsystems that allow interesting events to be detected and actions to be taken based on these events as they happen. For example, if a customer changes addresses five times in three months, that may trigger an alert that notifies event subscribers to contact the customer to validate his or her address on a periodic basis. Analytics may also be executed as an MDM transaction is taking place, using architected integration points that allow external functions to be invoked as part of an MDM service. A good example is the use of scoring functions to predict the likelihood of a customer canceling accounts at an institution. Such scoring functions can be developed by gaining a deep understanding of an issue, such as customer retention, through data mining and building a model of recurring customer retention patterns based on the combination of customer and transaction data maintained within a data warehouse. While it is timeconsuming to develop and validate such a model, the scoring model that results can be efficiently executed as part of an MDM service. This kind of analytics is called in-line analytics or operational analytics and is an important new way in which MDM Systems can work together with BI systems to provide additional value to an enterprise.

The final kind of MDM analytics is where the MDM System provides some key analytic capabilities. One particular kind of insight that can be derived from the information within an MDM System is the discovery of both obvious and non-obvious relationships between the master entities managed. An obvious kind of relationship would be one that discovered households based on a set of rules around names, addresses, and other common information. A non-obvious kind of relationship might find relationships between people or organizations by looking for shared fragments of information, such as a common phone number, in an effort to determine that people may be roommates. Searching for non-obvious relationships may also require rules that look for combinations of potentially obfuscated information— for example, transposed Social Security numbers and phone numbers—to identify potential relationships where people may be trying to hide their identities. Identity resolution and relationship discovery are important for both looking for questionable dealings9 and understanding a social network that a person is part of—and therefore are important for predicting the overall value of a person's influence.

The analytical style of usage encompasses a variety of capabilities. Populating external analytical environments such as data warehouses with data from an MDM System requires information integration tools to efficiently transfer and transform information from the MDM System into the star or snowflake schemas needed by the data warehouse. Integration with reporting tools is required in order to display key performance indicators and how they change over time. Rules, scoring, and event management are important capabilities for inline analytics within the MDM environment.

In practice, MDM usage will often cross the boundaries between collaborative, operational, and analytical usage. For example, collaborative MDM processes can be very useful in managing the augmentation of complex operational structures such as organizationa