In recent interchanges with different clients, we are starting to see a real conceptual differentiation between assessing the quality of a data object and assessing its quality within the contexts in which that data object is used. If we are to rely on the good ol’ reliable (albeit wishy-washy) definition of data quality as “fitness for use,” we might find ourselves confronted by a significant distinction when we pluralize it to “fitness for uses.”
So what is the difference? To answer this, it is worthwhile reviewing some of the approaches discussed in previous articles regarding data quality assessment. Our firm takes a stringent business-directed approach in evaluating how flaws in the quality of critical data elements ultimately impact the achievement of business objectives and characterizing corresponding quantifiable measurement processes. The goal of this approach is to create opportunities for identifying high-impact issues early in the processing streams in a preventive manner. When any issues are flagged early in the processes, it reduces the ultimate manifestation of business impacts at the ends of the processes.
In this manner, we are able to come up with a formula for “fitness for use,” as the measures assigned are based on the specific business requirements within specific business process workflows. It also allows for different measures of fitness. For example, we can look at completeness characteristics of customer data, such as telephone number and address. However, the availability of a telephone number is more critical to the telesales agent and less so to the shipping agent, while the shipping agent cares about a complete delivery address (which is largely irrelevant to the cold-caller!). In other words, completeness in relation to specific critical data elements is relevant within a specific business process.
So this provides us with many specifications of fitness for use, essentially defining data quality within a specific business context, and it allows us to specify some expectations for conformance of data instances (e.g., percentage acceptability levels, or absolute number of tolerated failures). This will tell us when issues arise that impact specific business objectives, but what it does not provide us with is a characterization of the quality of the data instance itself. We can quantify the score of conformance to specific rules in relation to each evaluated business process, but how does this relate to the data object itself?
This is a bit of a conundrum – our approach provides auditability with respect to data expectations, but how does one qualify the data itself? Is there an objective assessment that can be applied to present the level of quality of a customer name, address or a product description? From our point of view, the objective view in the absence of any context is probably an interesting intellectual exercise, but how it gets applied in practice becomes the major concern. One path for providing an objective assessment might combine all the criteria defined for fitness and then use the most stringent to characterize fitness.
In our example, a customer record is fit for shipping when it has a deliverable address, and fit for telesales with a proper telephone number. The characterization of the absolute quality of the customer record must therefore be defined as requiring a deliverable address and a reachable phone number – overkill for either application, albeit appropriate for both applications. The continued unioning of requirements from across multiple business processes will only increase the complexity of defining an objective score.
Or does it? Notwithstanding, there are data elements for which objective criteria can be assigned, as long as we consider the quality in relation to the real world object being modeled. For example, the accuracy of a telephone number’s assignment to an individual can be measured and documented (if not via conformance with a recognized source of truth, it can be verified manually by calling the number and asking for the individual). An address can be subjected to verification based on postal standards. A person name can be compared to public records, and so on.
These measurements provide an objective score of a data value with respect to “reality,” but do not necessarily measure fitness for use. For example, a U.S. address can be deliverable even if its ZIP code has only five digits and not nine. So care must be taken when using objective qualification if it suggests greater stringency than is needed. Fitness for use does not necessarily mean absent of defect. In turn, assessing quality against the objective score may force one to attempt corrective actions that may be above and beyond what is necessary to get the job done.
Bottom line: I came into this thinking that all data quality scoring needs to be in context, but came out seeing that there is value in an objective score for data, especially for direct comparison across different activities or data suppliers. However, the quality of the data still needs to be considered within the business context; the value of that objective score may not necessarily imply the most appropriate path for data quality remediation.
David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management:The Data Quality Approach and Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information.