Guide to managing a data quality assurance program
A comprehensive collection of articles, videos and more, hand-picked by our editors
Having finessed the perception that buying and installing data quality tools will lead to immediate improvements in the quality of your organization’s data, the next challenge emerges: managing internal expectations of how quickly things will get better once a data quality management program is approved, funded and initiated.
While there may be a desire to address as many data quality issues as quickly as possible, that isn’t always realistic. The fact that a data quality team may not have the authority to impose changes required to produce data quality improvements may create additional complexity and challenges. Therefore, IT and data quality managers must balance encouraging an organizational commitment to improving data quality with providing practical guidance about the results of a quality improvement program to corporate executives and other key business stakeholders.
By proposing a data quality plan and vision that addresses the organization’s business and data quality objectives while remaining pragmatically achievable, a data quality team should be able to prevent the expectations of business users from getting out of hand.
That practical vision can be based on the results of the initial data quality assessment, essentially incorporating the remediation activities that would most effectively address the identified data quality issues. And because the assessment produces concrete metrics on the occurrence of those issues, it provides a means for setting reasonable performance objectives for improving data quality.
Defining rules and metrics for improving data quality
During the data quality assessment, data flaws are exposed in relation to defined data quality dimensions, such as completeness, accuracy, currency, timeliness, consistency, uniqueness and reasonableness. That provides a quantifiable way of measuring quality levels. For example, if there is an expectation that a data attribute must have a non-null value, we can measure the percentage of records in the data set in which that attribute’s value is missing. That becomes one data completeness rule, and the level of tolerance for noncompliance can be decided based on input from business users. A 100% compliance level likely will be required in many cases; in others, a lower percentage may be acceptable.
Other data quality rules can also be formalized and quantified. Uniqueness rules can assert that there is one and only one record in the data set corresponding to each real-world entity. An accuracy rule can measure the degree to which a data instance’s values match those in a defined system of record. Consistency rules can validate the conformance of data attributes to defined value domains.
Compliance with the defined data rules then needs to be measured to track the progress of data quality improvements. To continue our completeness example, a data set that is 95% compliant with the non-null rule is of higher quality than one that is 90% compliant. A collection of data quality performance metrics can be continuously monitored using data profiling tools and techniques, and the results can be assembled into a data quality scorecard. As the metrics show greater conformance with data quality rules, there should be a corresponding improvement in the affected business processes.
Achievable milestones on the data quality improvement roadmap
With a set of agreed-upon remediation tasks and quality-level acceptability targets in hand, the next step is to lay out a roadmap for attaining those goals, broken up into separate phases with achievable milestones and deliverables. For example, a data quality improvement roadmap might contain the following phases:
- Establishing data quality fundamentals within the organization, including training and knowledge sharing, identification of relevant best practices, development and adoption of data standards, and the definition of data quality metrics and rules.
- Formalizing data quality activities – for example, forming a data quality team, defining different roles and responsibilities, instituting an error reporting and tracking process, and deploying data quality tools.
- Implementing the operational aspects of the data quality management program, through steps such as standardizing how data quality tools are used and creating processes for doing root-cause analysis of data errors, fixing the identified problems and validating data after remediation.
- Assessing and fine-tuning the data quality program through continuous monitoring and measurement of the defined performance metrics.
As part of the monitoring process, the data quality team can meet with business executives and users to update them on the progress of remediation efforts, determine the degree to which the stated objectives are being met and decide whether it’s reasonable to target a higher level of performance for the data quality improvements that are being made.
While there may be a desire to address as many data quality issues as quickly as possible, that isn’t always realistic.
As a practical matter, exploring the opportunities for adding business value by improving data quality should lead to the development of a successful business case, which in turn should guide the creation of a reasonable and achievable data quality roadmap.
The overall process can be “piloted” by following these steps: First, identify two business processes that are being negatively impacted by poor data quality. Then conduct a data quality assessment for each of the processes to identify specific data flaws and quantify their impact on the business and what it would cost to fix them. After that, move on to prioritizing the data quality problems that need to be resolved, identifying achievable goals, organizing a project plan and defining the performance metrics that will be used to measure the progress of your remediation efforts.
The end result should be an approved and funded data quality management program as well as a set of baseline quality measurements and business-directed metrics that can be used first to set a detailed plan for the data quality improvement work and then to monitor the program’s performance on an ongoing basis.
About the author: David Loshin is the president of Knowledge Integrity, Inc., a consulting company focusing on customized information management solutions in technology areas including data quality, business intelligence, metadata and data standards management. Loshin writes for numerous publications, including both SearchDataManagement.com and SearchBusinessAnalytics.com. He also develops and teaches courses for The Data Warehousing Institute and other organizations, and he is a regular speaker at industry events. In addition, he is the author of Enterprise Knowledge Management – The Data Quality Approach and Business Intelligence: The Savvy Manager's Guide. Loshin can be reached via his website: knowledge-integrity.com.