Data governance and data stewardship strategies and best practices

This chapter examines key steps of a generic data governance strategy program as it may apply to the CDI Data Hub and discusses the concept of data stewards and their role in assessing, improving and managing data quality.

The following excerpt from Master Data Management and Customer Data Integration for the Global Enterprise, by Alex...

Berson and Larry Dubov, is printed with permission from McGraw-Hill. Copyright 2007. Click here to read the complete Chapter 6: Data governance and data stewardship strategies and best practices.

Let's consider the following working definition of data governance.

Data governance
Data governance is a process focused on managing the quality, consistency, usability, security, and availability of information. This process is closely linked to the notions of data ownership and stewardship.

Clearly, according to this definition, data governance becomes a critical component of any Data Hub initiative. Indeed, an integrated CDI data architecture contains not only the Data Hub but also many applications and databases that more often than not were developed independently, in a typical stovepipe fashion, and the information they use is often inconsistent, incomplete, and of different quality.

More on data governance and data stewardship

Read about the need for a solid plan in any data stewardship framework

Learn how to prioritize communication and purpose in a data governance program

Discover how a data governance process can take on big data

Data governance strategy helps deliver appropriate data to properly authorized users when they need it. Moreover, data governance and its data quality component are responsible for creating data quality standards, data quality metrics, and data quality measurement processes that together help deliver acceptable quality data to the consumers—applications and end users.

Data quality improvement and assurance are no longer optional activities. For example, the 2002 Sarbanes-Oxley Act requires, among other things, that a business entity should be able to attest to the quality and accuracy of the data contained in their financial statements. Obviously, the classical "garbage in—garbage out" expression is still true, and no organization can report high-quality financial data if the source data used to produce the financial numbers is of poor quality. To achieve compliance and to successfully implement an enterprise data governance and data quality strategy, the strategy itself should be treated as a value-added business proposition, and sold to the organization's stakeholders to obtain a management buy-in and commitment like any other business case. The value of improved data quality is almost self-evident, and includes factors such as the enterprise's ability to make better and more accurate decisions, to gain deeper insights into the customer's behavior, and to understand the customer's propensity to buy products and services, the probability of the customer's engaging in high-risk transactions, the probability of attrition, etc. The data governance strategy is not limited to data quality and data management standards and policies. It includes critically important concerns of defining organizational structures and job roles responsible for monitoring and enforcement of compliance with these policies and standards throughout the organization.

Committing an organization to implement a robust data governance strategy requires an implementation plan that follows a well-defined and proven methodology. Although there are several effective data governance methodologies available, a detailed discussion of them is beyond the scope of this book. However, for the sake of completeness, this section reviews key steps of a generic data governance strategy program as it may apply to the CDI Data Hub:

  • Define a data governance process. This is the key in enabling monitoring and reconciliation of data between Data Hub and its sources and consumers. The data governance process should cover not only the initial data load but also data refinement, standardization, and aggregation activities along the path of the end-to-end information flow. The data governance process includes such data management and data quality concerns as the elimination of duplicate entries and creation of linking and matching keys. We showed in Chapter 5 that these unique identifiers help aggregate or merge individual records into groups or clusters based on certain criteria, for example, a household affiliation or a business entity. As the Data Hub is integrated into the overall enterprise data management environment, the data governance process should define the mechanisms that create and maintain valid cross-reference information in the form of Record Locator metadata that enables linkages between the Data Hub and other systems. In addition, a data governance process should contain a component that supports manual corrections of false positive and negative matches as well as the exception processing of errors that cannot be handled automatically.


  • Design, select, and implement a data management and data delivery technology suite. In the case of a CDI Data Hub both data management and data delivery technologies play a key role in enabling a fully integrated CDI solution regardless of the architecture style of the Data Hub, be it a Registry, a Reconciliation Engine, or a Transaction Hub. Later in this chapter we will use the principles and advantages of service-oriented architecture (SOA) to discuss the data management and data delivery aspects of the Data Hub architecture and the related data governance strategy.


  • Enable auditability and accountability for all data under management that is in scope for data governance strategy. Auditability is extremely important as it not only provides verifiable records of the data access activities, but also serves as an invaluable tool to help achieve compliance with the current and emerging regulations including the Gramm-Leach-Bliley Act and its data protection clause, the Sarbanes-Oxley Act, and the Basel II Capital Accord. Auditability works hand in hand with accountability of data management and data delivery actions. Accountability requires the creation and empowerment of several data governance roles within the organization including data owners and data stewards. These roles should be created at appropriate levels of the organization and assigned to the dedicated organizational units or individuals.

To complete this discussion, let's briefly look at the concept of data stewards and their role in assessing, improving, and managing data quality.


Data Stewardship and Ownership

As the name implies, data owners are those individuals or groups within the organization that are in the position to obtain, create, and have significant control over the content (and sometimes, access to and the distribution of) the data. Data owners often belong to a business rather than a technology organization. For example, an insurance agent may be the owner of the list of contacts of his or her clients and prospects.

The concept of data stewardship is different from data ownership. Data stewards do not own the data and do not have complete control over its use. Their role is to ensure that adequate, agreed-upon quality metrics are maintained on a continuous basis. In order to be effective, data stewards should work with data architects, database administrators, ETL (Extract-Transform-Load) designers, business intelligence and reporting application architects, and business data owners to define and apply data quality metrics. These cross-functional teams are responsible for identifying deficiencies in systems, applications, data stores, and processes that create and change data and thus may introduce or create data quality problems. One consequence of having a robust data stewardship program is its ability to help the members of the IT organization to enhance appropriate architecture components to improve data quality.

Data stewards must help create and actively participate in processes that would allow the establishment of business-context-defined, measurable data quality goals. Only after an organization has defined and agreed with the data quality goals can the data stewards devise appropriate data quality improvement programs.

These data quality goals and the improvement programs should be driven primarily by business units, so it stands to reason that in order to gain full knowledge of the data quality issues, their roots, and the business impact of these issues, a data steward should be a member of a business team. Regardless of whether a data steward works for a business team or acts as a "virtual" member of the team, a data steward has to be very closely aligned with the information technology group in order to discover and mitigate the risks introduced by inadequate data quality.

Extending this logic even further, we can say that a data steward would be most effective if he or she can operate as close to the point of data acquisition as technically possible. For example, a steward for customer contact and service complaint data that is created in a company's service center may be most effective when operating inside that service center.

Finally, and in accordance with data governance principles, data stewards have to be accountable for improving the data quality of the information domain they oversee. This means not only appropriate levels of empowerment but also the organization's willingness and commitment to make the data steward's data quality responsibility his or her primary job function, so that data quality improvement is recognized as an important business function required to treat data as a valuable corporate asset.

More information on data governance

Dig Deeper on MDM best practices