This article originally appeared on the BeyeNETWORK
Effective data governance entails the ability to accurately plan, measure and monitor data quality at predetermined levels of acceptability. A given data governance model may focus on one or more objectives including business intelligence, firm-wide data quality, security, data warehouse, and IT portfolio management. Equally important is to distinguish data governance objectives in terms of master data management (MDM) from that of centralized data or robust list management. MDM provides like copies of master data dispersed across the enterprise that must be reconciled, calling for a significantly different data governance practice than a master hub which provides data to subscribers upon request. In each instance, there exists a consistently strong need to attain agreement on what data is deemed acceptable or unacceptable in both the current state and in the future state model.
This article discusses the challenges, methods and pitfalls firms encounter in controlling and measuring those components that comprise meaningful data governance policies. We will use a reference data example to illustrate, describe and provide consistency to our narrative.
Defining Measurable Quality
How is data confidence attained? It requires a clear definition of what “quality” is, while establishing acceptable variance tolerances and the ongoing ability to measure the accuracy of data. To accomplish this, we must first recognize the distinction between “quality” (something that does what you need it to do) and “grade” (the degree to which something is valued). This distinction can be illustrated in a simple example: While in a restaurant, you may be served a lobster (high grade) instead of a hot dog (low grade) for your dinner, but, if you were craving a salad and had ordered one, both the lobster and the hot dog would be considered low quality as they did not “do what you needed them to do” despite their grade.
Now, based on a better understanding of “quality,” we can readdress the question of how to identify the level of quality at which confidence attained. Borrowing a page from PMI’s Project Management strategies (PMBOK), quality is defined as, “That degree to which a set of inherent characteristics fulfill a requirement.”
If this sounds highly subjective, it is. Two firms performing identical business functions may have very different levels of confidence in their reference data. This acknowledgement and the methods by which each compensates for its firm’s data quality (acceptable variance levels, automated transaction repair, acceptable transaction failure rates, etc.) establishes a unique “footprint” comprising a firm’s value structure (i.e., corporate culture). As a result, terms like “customer satisfaction,” fitness to use” and “conformance to requirements” will mean different things to these two companies as each term is comprised of:
- An understanding of the requirement.
- A willingness to invest in solutions deemed appropriate to satisfy the requirement.
The Value of Accuracy
Expanding on the definition of quality, Figure 1 depicts the classic PMBOK Quality model comprised of scope, time and cost. The theory maintains that once a desired level of quality is identified, a shift in any of its three elements will alter quality and at least one of the other two elements will have to be adjusted to re-establish acceptable quality.
Known as the “triple constraint” of quality, people have little difficulty describing each of these elements, but still struggle in precisely defining quality.
Figure 1: PMI’s Triple Constraints of Quality
One persistent truth in data governance is that quality rarely equates to zero data defects. As firms make value judgments on their definition of acceptability (How accurate is good enough?), they weigh the cost of data accuracy against their sense of acceptable rates of transaction failures. How much can be justified to invest in maintaining true correctness of reference data versus the firm’s ability to capture/repair transactions via supporting processes? Where the objective is the minimization (not elimination) of failed transactions, multiple data accuracy/transaction repair combinations are plausible. But, whatever the ultimate mix, both ROI and limited organizational resources factor strongly in determining the firm’s resulting definition of quality.
Once we’ve defined quality, the immediate question becomes: Are we as a company willing to commit to achieving the quality levels we profess? Within truthful answers are found the foundation of realistic expectations in data governance.
Building toward Quality
Whether designing an equities securities pricing master for a brokerage house or a Mars-bound lunar probe, quality is subjective based on the requirement of performance, its ability to perform and the assumed risk for failure. This may be based solely on an organizational determination (e.g., NASA) or on industry-based best practices that evolve into standards (e.g., IEEE). In all instances, ensuring the quality we produce meets the quality we’ve specified should involve these three mechanisms:
- Quality planning – Identifying which quality standards are relevant, determining how to best satisfy them and establishing a quality management plan to attain quality. The focus here is on preventing defects before they happen.
- Quality control – Monitoring the work results (not the process) to ensure it satisfies the requirement.
- Quality assurance – Continually examining the outputs of quality planning and quality control, and finding ways to improve the process.
In creating a product of quality (in this case, reference data), we balance the quality mix by:
- Investing in the development process;
- Identifying/capturing defects, identifying the cause, repairing errors;
- Refining/improving the process.
A mature data governance strategy functions against this quality management model utilizing a “Plan – Do – Check – Act” cycle to monitor and continually improve, supporting our designation of quality. As a company, to what degree are we willing to commit to achieving those levels?
Components in Data Confidence
Measuring the direct effectiveness and quality of enterprise-wide data governance poses challenges as work is often funded by sponsors of programs/projects addressing specific business initiatives. In these bottom-up instances, organizational gains in reference data quality are greatly determined by the objectives of the program/project. As work is funded at these program/project levels, reference data quality makes its way up through the enterprise, in inconsistent degrees, based on each individual line of business’s interpretation of the triple constraint.
An enterprise-wide aggregation of the way in which programs/projects are sponsored leads to obvious challenges in determining top-down data confidence. As we base day-to-day business decisions on our reference data, we incur risk based on data accuracy and confidence in that data (i.e., quality). While the methods for addressing risk are very well defined (transfer, accept, prevent, mitigate), quantifying data confidence is far more subjective.
To measure confidence in reference data, we begin with assigning values to key questions. Here are a few standard examples:
- Is this the best possible source of reference data for us? This goes back to our definition of quality. Do we have more than one potential supplier of this reference data? If so, what are the trade-offs we make, in determining our ultimate supplier if we choose someone other than the best possible source.
- Are we confident that we clearly understand what the data represents? In the instance of “price,” for example, do we clearly understand what price (“purchase,” “sale,” “discounted,” “net,” etc.) and to what time frame that price is referring? If the meaning of the price is determined by other data elements, do we understand these and are they always consistent? If we receive this pricing data from a source that is out of our control, how confident are we that we understand how the data is processed (gathered, scrubbed, altered) prior to receiving it?
- Is the reference data made available to us in an optimal manner? Just because the data is correct, doesn’t mean it best serves our needs. Information that is provided to us in a manner requiring us to undergo significant processing efforts (incompatible data formats, extensive data mappings/conversions, etc.) can be as costly as inconsistent data. In an era of large organizational mergers, on-site/offshore component development and ongoing IT processing enhancement initiatives (MIPs optimization, DBMS data model consistency, etc.), compatibility between data availed to you and your application architectural standards is vital.
- Have we clearly articulated the business case for our reference data supplier? We’ve examined the pros and cons of who we will use as our reference data supplier. Very often, there was a potential candidate who was equally or better capable of providing the data, but, for our business needs and data governance requirements, we have selected someone else. In other words, our selected supplier adequately meets our reference data needs (i.e., “good enough for our purposes”). An example of this might be an end-of-day versus. a real-time supplier.
By applying weights and confidences to such questions, data confidence can begin to be quantified. Such rating, expressed as percentages, represent confidence levels as expressed by the firm’s reference data consumers, IT, etc., based on their knowledge and expertise. The product of these confidence factors yields an overall confidence rating. Of course, these factors can be weighted as desired.
Figure 2: Factoring Confidence in our Reference Data
Multiplying these values (accumulated at the program/project level) yields an unweighted “total confidence” value. While subjective, the degree to which total confidence”falls below 100% represents uncertainty permeating the processes and calculations that drive the business. To manipulate any or all of these factors would alter our confidence in reference data.
As we move up the enterprise, these program/project total confidence levels, viewed at the portfolio and enterprise levels, will likely decrease.
Recognizing a Firm’s Motivation, Objectives and Starting Point
Data governance, the pursuit of the improved health of data, can be likened to an individual joining a gym. How healthy is healthy? Both of these pursuits are typically approached with strong degrees of optimism, intent and commitment often turning to disenchantment upon hitting a plateau at which they stagnate or a level they deem below our expected norm. For both data governance and gym membership, success is never found merely through the contribution of more money to the cause. Rather, it is found through self-examination by establishing realistic expectations and embracing these findings into our culture.
For data governance, a meaningful self-examination will involve the following elements shown in Figure 3.
Figure 3: Elements in Determining Definition of Health Data Governance Practices
By manipulating these elements, we determine our level(s) of acceptable reference data confidence. Businesses, like people, have well-established cultures evolving over years. Firms faced with mergers, new business opportunities and exponential transaction rates are forced to evolve their culture in support of the firm’s new realities. But, in referring back to our example of two firms conducting identical business, we have not previously accounted for this concept of corporate culture. Here, we will find distinct differences by which each firm conducts their business activities. For both firms, it may be challenging to quantify the precise day-to-day impact of their current-state data governance policies. But, only by knowing where we are today, can we portray a compelling depiction of the future under an “all things being equal” scenario.
Figure 4 shows the process of a generic securities processing example.
Given a constant rate of trade transaction failures (.0076920%), we can readily calculate the cost of trade transaction failures to the firm given scenarios in which daily trade volumes increase as shown in Figure 5:
Eliminating uncertainty always comes at a price. For reference data significant changes in quality and confidence requires a shift in the organizational culture. Drawing on our gym analogy, success requires setting realistic goals that we’ll embrace over the long term. By doing so, we adjust our culture, our values and our results.
The cost in addressing risk may be high; but, by knowing the factors that influence the relevant aspect of risk, we can determine our level of reference data confidence and, thus, the true costs, risks and factors underlying this confidence.
Examining the Existing Corporate Culture and Potential for Change
Each firm’s data governance practices reflect both their business requirements and the company’s values. Making a compelling case to alter these traits (e.g., enterprise-wide, trading desk-wide, etc.) requires a clear understanding of the firm’s values and its ability to change.
Several techniques serve well in modeling a firm’s data governance practices giving both an understanding of existing culture (values) and the potential for change. Figure 6 depicts one, the McKinsey 7S Model.
Figure 6: The 7 Elements of the McKinsey 7S Model
Conducting a Reference Data Capability Assessment
Using information gathered from the data governance modeling exercise, we can identify the firm’s current data governance capabilities. Referring to “Using the “Reference Data (RD) Governance Maturity Levels” (Figure 7, which lists only the high-level attributes of each Level), we can now:
- Identify our current level of reference data governance maturity.
- Measure gaps/inconsistencies between our current state and where we envision our future state.
- Develop a roles and responsibilities depiction (e.g., RASIC Matrix) in support of our future state reference data maturity level.
- Conduct a future state reference data governance modeling exercise.
Figure 7: Maturity Levels in Reference Data
While unlikely that a single iteration of these methods will yield a data governance model at the desired maturity level, it will provide a means to determine where you are in terms of data governance maturity and what is realistically feasible in the near and long term. After each assessment and iteration, further refinement will be achieved.
The key concept to take away from this article is that good data governance is a transformation, not a project. The greatest influences in reference data quality consistently occur as by-products to business line funded projects and programs aimed at performing certain business tasks. Over time, as these projects and programs come to an end, the data quality they establish roles up through the organization.
The evolution of enterprise-wide top-down reference data governance is most significantly impacted by IT’s application-focused and processing-silo history through which data quality evolves at the program/project level and makes its way up the organization. This method, applied in many variations across the firm, ensures that inconsistent standards and tolerances will evolve and be enforced to varying degrees. Using consistent methods to measure data quality and data confidence, we can more accurately gauge, establish and influence portfolio/enterprise-level tolerances and enforce them at the program/project level.
This article has attempted to describe one approach toward evolving a top-down model to measure, influence and maintain enterprise-consistent levels of acceptability in reference data quality.
- Supporting information has been obtained from The Project Management Book of Knowledge (PMBOK), published by the Project Management Institute (www.pmi.com).