Once you’ve demonstrated the relationship between data errors and business impacts as part of building a business case for a data quality improvement program, refining those correlations into more discrete metrics through a formal data quality assessment process helps to identify required remediation tasks and prioritize them based on their expected value.
The objective of a data quality assessment is to fully document existing data quality problems, develop recommendations for fixing them and gain an understanding of a data quality management initiative’s required scope.
And as opportunities for data quality improvements are identified, metrics can be defined and used for continuous monitoring of the data quality program’s progress once it’s approved and funded. These metrics will enable objective measurements of the ongoing data quality fixes and help IT managers and data analysts determine whether the levels of improvement are sufficient to meet business expectations.
In concert with the recommended approach of mapping data quality issues to business objectives, a repeatable process for an incremental data quality assessment that focuses on individual business processes and their data usage scenarios will limit the effort involved while enabling you to detect and document the most egregious issues. Focusing on the data supporting one business process at a time achieves several important goals:
- From a training perspective, a limited scope can help bring data quality analysts up to speed on understanding the techniques that need to be used as part of the data quality assessment.
- Successfully building the working relationship between the data analysts and business users establishes trust between the two communities.
- Working closely with a small number of users makes it easier to identify the issues whose remediation best meets their needs and helps set expectations for what the technical teams can achieve in partnership with business users.
To begin a data quality assessment, the data quality analysts map the information flow supporting the chosen business process, identifying where data is created, exchanged and used and then selecting key locations in the process flow where the data should be probed for possible quality issues. From here, the assessment process involves three subsequent steps: preparation for and execution of the data analysis, synthesis of the results and a review of the findings with the business users.
Data quality assessment options: directed vs. undirected
In our experience, an assessment of the data associated with a single business process can be performed in as few as 10 staff days over a three-week period, although that may be affected by factors such as scheduling constraints, access to the data and the availability of data quality tools. For example, the analysis stage potentially can be simplified by using data profiling and statistical analysis software. Often, analysts explore data in an undirected way, letting a profiling tool direct the process while drilling through various quality-related measurements with little forethought, seeking any potential anomalies that may indicate a data flaw.
However, a directed assessment will consider the potential data quality issues in the context of the selected business process, reducing the focus to the quality of those data instances that already have been related to identified business impacts. A directed approach should include the following tasks:
- Identification of critical data elements, in which the analysts reduce the sets of data elements to be examined to those that are involved in producing the impacted results.
- Definition of data quality metrics that can be used to articulate specific quality improvement expectations.
- Preparation of the data analysis environment, including isolation of data sets for review and configuration of data profiling, statistical analysis and querying tools.
Despite limiting the analysis to those data elements that are critical to the business process, it’s still premature to draw any conclusions regarding the quality of data until the results are synthesized and reviewed within a business context. The synthesis work should result in a report with a list of potential data anomalies, each annotated with a description that links it to one or more identified business impacts. The report should also detail the analysis process and include remediation recommendations based on the results of the data profiling that has been done. It provides tangible evidence of critical data flaws that can help to justify the need for a data quality improvement program.
Putting a face to the data quality assessment process
The next step is to review the data quality assessment report with business executives and users to determine the severity of the data quality issues and prioritize them based on factors such as business significance and relevance. The review process can also help the data quality team to “socialize” the concrete steps that can be taken to remediate the data quality problems.
Done properly, a data quality assessment solidifies the business case for investing in quality improvement initiatives, including an evaluation of whether the associated costs would result in a reasonable return on the investment. And by concentrating on one business process at a time, IT can implement fixes that provide immediate value to the organization, while the data quality analysts can iteratively analyze other data sets, thereby achieving some parallelization in the process.
This repeatable assessment process should either validate or negate the “fuzzy” organizational perception of poor data quality – and help to drive a commitment to measuring, monitoring and improving data quality.
About the author: David Loshin is the president of Knowledge Integrity, Inc., a consulting company focusing on customized information management solutions in technology areas including data quality, business intelligence, metadata and data standards management. Loshin writes for numerous publications, including both SearchDataManagement.com and SearchBusinessAnalytics.com. He also develops and teaches courses for The Data Warehousing Institute and other organizations, and he is a regular speaker at industry events. In addition, he is the author of Enterprise Knowledge Management – The Data Quality Approach and Business Intelligence: The Savvy Manager's Guide. Loshin can be reached via his website: knowledge-integrity.com.