Essential Guide

Guide to managing a data quality assurance program

A comprehensive collection of articles, videos and more, hand-picked by our editors
Manage Learn to apply best practices and optimize your operations.

Applying data quality rules is a job for users -- and data virtualization

Consultant David Loshin outlines a strategy for ensuring data is suitable to use: Make business units responsible for applying data quality policies and use data virtualization tools to help manage the process.

Whose responsibility is it to ensure data quality? Despite the desire for a simple solution, the answer to that...

question remains complex.

Even in a constrained environment with only one method of inputting data and one channel for viewing that data, you could propose four alternatives: Make the data creator or supplier responsible for ensuring data inputs adhere to an organization's data quality rules before committing them to the target system; make the system owner responsible for ensuring data meets the quality rules once it has been loaded into the system; make the system owner responsible for ensuring the rules are observed before information based on the data is presented to end users; or make the users responsible for ensuring data complies with the rules before using the presented information.

Attempting to distinguish the qualitative differences between those choices forces you to consider a slightly different question: Whose perception of data quality requirements is being adhered to? Ultimately, you'd like to believe the primary intent of the data quality process is to benefit the users of the data -- so the simplistic response would be that the data consumers should "own" the data quality rules. Even so, it would be beneficial to apply those rules as far back in the information flow as possible -- ideally, all the way back to the data creator.

Data quality picture not always clear

But while that might be optimal, the reality is somewhat murky for three reasons. First, the data originators may have created the data for one purpose, but the rules governing data quality for that original purpose may not necessarily be aligned with the downstream ones at the end-user level. Second, from an administrative standpoint, data producers may not have the resources -- due to both budget and time limits -- to address the data quality issues of users. And third, in many cases, transforming the data collected in a corporate system of record isn't feasible.

Also, recall that we're looking at a constrained model: one source of input and one channel for output. Imagine that constraint being lifted and the number of data providers increasing by a factor of 10 or more, and the number of data users ballooning as well.

In this more realistic scenario, different groups of users often are going to have different sets of data quality rules. If we review the first two of our four alternatives for assigning data quality responsibilities, we face two potential problems: scalability and conflicts. Validation of data on entry into systems now becomes a matter of validating it against all the downstream quality rules, raising the question of whether that can realistically be done while still meeting service-level agreements on processing performance. More seriously, the possibility of internal conflicts arises when different rules are inconsistent with one another.

Unless rigid enterprise data governance policies are in place, implementing data quality assurance processes at the point of data entry may be both a technical and an operational challenge. That leaves us with the third and fourth alternatives, both of which involve applying rules on data quality when data is used, with either the system owner or the users taking responsibility for applying them.

Put the data quality controls in users' hands

If a department or business unit controls the rules, it seems unwieldy to "outsource" enforcement to another party. The only logical conclusion is that users should enforce their rules at the point of consumption. In other words, because data quality is relevant based on the context of how the data is used, this is a case of beauty (or quality) being in the eye of the beholder (or user).

That doesn't free IT data practitioners from having a role in data quality management and assurance. But the particulars of their role must be adjusted tofacilitate the development of a framework for defining and implementing a data quality policy that's specific to each group of users. And data virtualization provides one possible means of balancing the need for unified access to data with customized application of quality rules.

Using data virtualization software, data management professionals can create semantic layers customized to the needs of different groups of users on top of a foundational layer supporting federated access to the underlying systems. Data quality validation, and data transformation and standardization, would then be sandwiched between the two layers. This approach allows the same data to be prepared in different ways that are suited to individual usage models, while retaining a historical record of the application of data quality policies -- and, therefore, traceability and auditability.

As the numerous data sources in the typical corporate enterprise become increasingly critical to business success, the complexity of data quality assurance also has the potential to increase significantly. But by pushing stewardship of data quality rules to the user level and deploying data virtualization tools to help manage the process, you'll have a good chance of ensuring suitable data is available for use and unnecessary conflicts  over quality issues don't bog your organization down in acrimonious infighting.

About the author:

David Loshin is president of Knowledge Integrity Inc., a consulting and development services company that works with clients on big data, business intelligence and data management projects. He also is the author or co-author of various books, including Using Information to Develop a Culture of Customer Centricity. Email him at loshin@knowledge-integrity.com.

Email us at editor@searchdatamanagement.com and follow us on Twitter: @sDataManagement.

Next Steps

More advice from David Loshin: Five steps for improving your data quality strategy

See why consultant Andy Hayler says businesses need to take data quality to a higher level

Get tips from consultant Lyndsay Wise on creating high-quality business intelligence data

This was last published in November 2014

PRO+

Content

Find more PRO+ content and other member only offers, here.

Essential Guide

Guide to managing a data quality assurance program

Join the conversation

6 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

Who's responsible for making sure data quality rules are adhered to in your organization -- IT, business units, another group?
Cancel
Our primary IT people are responsible for the guidelines, while managers are responsible for enforcement. It's mostly a straightforward checklist that they run through every week to be sure everything's still being recorded properly.

I've been thinking about tying things like requests for vacation time to a checklist of maintenance tasks - offering added incentive to make sure everything's done correctly before they leave does have some value, though I haven't decided either way just yet.
Cancel
I am currently in the process of settng up a Data Quality Management group from a business perspecitive to collect and test the business rules. providing scorecard and drive improvement initiatives prior to some major systems consolidation.
Cancel
Well, I guess that anyone who is inputting data into a production database is responsible for ensuring the quality of that data.

So that is to say, a lot of people.

We don't have specific data quality rules in place; it is expected that any changes to any data inputs will be tested thoroughly as part of the normal process. This includes testing of any downstream processes and/or business applications that use the data.

This doesn't always happen, but it should. Some people just suck at their jobs, and that's a larger problem.
Cancel
Agree 100% on those entering should be responsible but organisations need to ensure they provide the necessary training and tools as well as hiring the right talent in the first place. If people suck then that it is poor management for not rectifing through training, reassignment etc.
Cancel
David, great article as always. I have request. It will be great to have a diagram while explaining the concepts you discussed here. Specifically, below paragraph would make it very easy to understand with a diagram showing the layers you mentioned.

Using data virtualization software, data management professionals can create semantic layers customized to the needs of different groups of users on top of a foundational layer supporting federated access to the underlying systems. Data quality validation, and data transformation and standardization, would then be sandwiched between the two layers. This approach allows the same data to be prepared in different ways that are suited to individual usage models, while retaining a historical record of the application of data quality policies -- and, therefore, traceability and auditability.

Thank you!
-Prash
Cancel

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchAWS

SearchContentManagement

SearchOracle

SearchSAP

SearchSQLServer

Close