BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Forward-thinking business executives recognize the value of establishing and institutionalizing best practices...
for enhancing data usability and information quality. But problems can arise if companies make piecemeal investments in aspects of data cleansing and correction. The absence of comprehensive data quality assurance and management processes leads to replicated efforts and increased costs; worse, it impedes the delivery of consistent information to the community of business users in an organization.
What's needed is a practical approach for aligning disparate data quality activities with one another to create an organized program that addresses the challenges of ensuring and maintaining high quality levels. Aside from engaging business sponsors and developing a business case for data quality assurance investments -- both requirements in their own rights -- here is a list of five tasks and procedures that are fundamental to effective data quality management and improvement efforts.
Document data quality requirements and define rules for measuring quality. In most cases, data quality levels are related to the fitness of information for business uses. Begin by collecting requirements: Engage business users, gain an understanding of their business objectives and solicit their expectations for data usability. That information, combined with shared experiences about the business impact of data quality issues, can be translated into rules for measuring key dimensions of quality, such as data completeness, currency and freshness -- also, consistency of data value formats in different systems and with defined sources of record. As part of the process, create a central system for documenting the requirements and associated rules to support the development of data validation mechanisms.
Assess new data to create a quality baseline. A repeatable process for statistical data quality assessment helps to augment the set of quality-measurement rules by checking source systems for potential anomalies in newly created data. Statistical and data profiling tools can scan the values, columns and relationships in and across data sets, using frequency and association analyses to evaluate data values, formats and completeness and to identify outlier values that might indicate errors.
In addition, profiling tools can feed information back to data quality and data governance managers about things such as data types, the structure of relational databases, and the relationships between primary and foreign keys in databases. The findings can be shared with business users to help in developing the rules for validating data quality downstream.
Implement semantic metadata management processes. As the number and variety of data sources grows, there is a corresponding need to limit the risk that end users in different parts of an organization will misinterpret the meanings of common business terms and data concepts. Centralize the management of business-relevant metadata and enlist business users and data management practitioners to collaborate on establishing corporate standards to reduce the situations in which inconsistent interpretations lead to data usage problems. The metadata and an associated data dictionary can then be made accessible as part of a data catalog that helps users find and understand available data.
Check data validity on an ongoing basis. Develop automated services for validating data records against the quality rules you've defined. A strategic implementation enables the rules and validation mechanisms to be shared across applications and deployed at various locations in an organization's information flow for continuous data inspection and quality measurement. The results can be fed into a variety of reporting schemes -- for example, direct notifications and alerts sent to data stewards to address acute anomalies and high-priority data flaws, and data quality dashboards and scorecards with aggregated metrics for a wider audience.
Keep on top of data quality problems. Develop a platform for logging, tracking and managing data quality incidents. Measuring compliance with your data quality rules won't lead to improvements unless there are standard processes for evaluating and eliminating the root causes of data errors. An incident management system can automate processes such as reporting and prioritizing data quality issues, alerting interested parties, assigning data quality improvement tasks and tracking the progress of remediation efforts.
Done properly, these activities form the backbone of a proactive data quality assurance and management framework, with controls, rules and processes that can enable an organization to identify and address data flaws before they cause negative business consequences. In the end, fixing data errors and inconsistencies and making sure their root causes are dealt with will enable broader and more effective utilization of data, to the benefit of your business.
Read about the potential uses of data quality tools in organizations
Key features to look for when evaluating data quality software
Improving the usability of big data should be a data quality priority