Guide to managing a data quality assurance program
A comprehensive collection of articles, videos and more, hand-picked by our editors
Gartner Inc. estimates that the worldwide market for data quality tools is about $1 billion a year -- a relatively modest figure, indicating that "a good chunk" of the corporate world still hasn't automated the data quality process, according to Gartner analyst Ted Friedman. And, he said, there are some good reasons why many organizations have been slow to adopt data quality software.
For one thing, the tools typically are expensive and require considerable expertise, Friedman said. "There's a contradiction: Data quality is a business issue, but the technology doesn't lend itself to use by the business people who should be the ones making the decisions [about data problems]." Another barrier to broader adoption, he added, is that IT and data management teams "have generally done a lousy job of justifying investments in data quality."
In their defense, building a business case and winning approval for a data quality technology purchase isn't easy. Data quality can be something of an abstract concept for corporate executives; accuracy is good, of course, but the business impact of a data quality initiative is often harder to grasp than, say, the benefits of a business intelligence (BI) program. But with data volumes growing inexorably, and BI and analytics applications that depend on clean data becoming increasingly critical to business success, Friedman thinks automating data quality management efforts will eventually be a necessity instead of an option.
Michael Chalhoub, director of client services at Eagle Creek Software Services, an IT consulting and outsourcing company in Eden Prairie, Minn., said a good place to start in building a business case is to do a comprehensive cost-benefit analysis that forecasts expected savings and other financial gains from improved data quality over the course of a five-year period, if not longer.
Opportunity for improvement in the data quality process
In most organizations, Chalhoub said, there's ample room for improvement, particularly if data quality analysis and remediation is currently a fully manual process. To help document existing problems, he recommends seeking feedback on data quality issues from business users. Quantifiable data quality metrics should be developed to track progress in resolving the identified issues -- and then, he said, IT managers can go ahead and show business and corporate execs how the use of data quality tools can help improve data accuracy and integrity.
Data quality is largely a process-improvement activity that can be enhanced by using tools, not the other way around.
Knowledge Integrity Inc.
But to ensure that business buy-in holds up, it's imperative to paint a realistic picture of the likely pace and duration of a data cleansing initiative, Chalhoub added. He said proponents should clearly define the objectives, desired outcomes and expected timeline of a data quality program, using a phased approach with built-in checks and balances for evaluating the progress of the effort and modifying the project plan as needed.
David Loshin, president of consultancy Knowledge Integrity Inc. in Silver Spring, Md., seconded the idea that companies should avoid purchasing data quality software until they have a clear understanding of their data shortcomings and a plan for addressing the problems, including changes in business processes to minimize data errors.
"Data quality is largely a process-improvement activity that can be enhanced by using tools, not the other way around," Loshin said. "Don't get too involved in the marketing messages that suggest it's easy to implement [automated tools] and get benefits. Instead, look at the benefits you want and then look at the processes you'll need to go through to get them." That might include using commercial software, but IT managers shouldn't assume so up front, he advised.
Impact assessments aid data quality cause
One big business-case stumbling block is that most organizations don't do a good job of quantifying the negative effects of poor data quality, Friedman said. "Everyone whines about bad data, but very few companies measure the impact." Doing so can create "a clear connection" between the data quality process and business performance, he added.
Another way to help make the case for buying data quality tools, according to Friedman, is to "bullet point" every example of ad hoc, manual fixes for data issues across an organization. Such Band-Aid approaches "have gaps and can be impossible to maintain," he said, and they incur internal costs that often are enough to justify an investment in packaged software.
The increasing focus on collecting and analyzing social media data, sensor readings and other forms of big data can also be a selling point for an automated data quality strategy. "That makes manual processes and custom coding even less tenable as time goes on," Friedman said.
In the meantime, though, he cited steps that companies can take to improve data quality without leaning on new technology -- for example, training business users to better understand how they can affect data quality and revising business processes to reduce the opportunities for inaccurate data to creep into systems. "Lots of companies," he said, "have these horribly complex processes, where data is touched far too many times by too many people."
About the author
Alan R. Earls is a Boston-based freelance writer focused on business and technology.