This article originally appeared on the BeyeNETWORK.
Consider this scenario: You are a team member on your company’s customer data integration project. One of your tasks, perhaps in a data stewardship role, is to evaluate customer records acquired from a third party to ensure that each record complies with certain corporate information policies. During the execution of this task, however, you see a customer name that you recognize, perhaps a relative or a new neighbor. As part of your review of that customer’s record, you realize that you know that the customer’s address is incorrect – perhaps the customer recently moved into a new home and the address provided is no longer valid. Should you correct the record?
I know what you are thinking – the answer is obvious. Unfortunately, half of you think the answer is clearly to correct the record and the other half think the answer is to absolutely leave the record alone. And both sides are justified in their stances, but in order to understand why, let’s abstract the scenario a bit into more direct questions:
- What are the issues with modifying data?
- What are the circumstances under which an individual is permitted to directly modify data?
The first question is a bit of a red herring – presumably, anyone in the organization would like critical data sets to reflect the highest levels of data quality; and if someone knows that there is a flaw in the data, correcting the error improves the quality of the record along with the corresponding downstream applications dependent on that record. Rather, the issues have to do with the approach used for making that change – not just who changes the data, but under what authority is that person allowed to modify the record, who approves that modification, how is the modification logged, what do we do if the record needs to be reverted to its earlier state, has that copy of the record become unsynchronized with other copies and so on. In other words, directly modifying data requires a significant amount of oversight, control and auditing before letting anyone have “modify access” to a data set.
It is the authority of the source making the change that might be of greatest concern. An individual’s attempt to modify a record based on what could be termed “circumstantial evidence” is precarious at best and could have deeper ramifications. For example, that neighbor that just moved in next door may be using that house only as a vacation home, with a previous address remaining as her official residence. Modifying the address might trigger other events that could impact the customer, perhaps in very inappropriate ways.
However, instituting layers of hierarchical control means that the amount of overhead and approvals required to correct known inaccuracies would elongate the time frame in which invalid information can be made to be fit for the downstream purposes. And knowing that a record contains inaccurate data and not doing something about it is somehow disconcerting. At some point in time, if the value is really not accurate, the record will need to be corrected. That brings us to the second question: Under what circumstances is data correction allowed?
Let’s boil it down to addressing some of our identified issues and asserting some basic concepts that would be the start of a nascent governance framework for data:
- Who is changing the data? – Records may only be modified by approved individuals who have been trained to abide by all documentation and logging requirements as part of a change control process.
- Under what authority is that person allowed to modify the record? – The staff member modifying the data is accountable for all impacts associated with a modification. That person must also document a verified authoritative source demonstrating that the modification is of higher quality (more accurate, more recent, more precise and so on).
- Who approves that modification? – Before a modification is committed, it must be reviewed by an alternate staff member to verify that the modifier has the appropriate credentials for modification and to review the justification based on the verified authoritative source.
- How the modification is logged – Under a change control system with versioning, log the details of the modification, such as who is making the modification, what triggered the modification, along with all proper justifications and authoritative sources as well as the additional alternate staff member approval.
- What to do if the record needs to be reverted to its earlier state – Now that we have a log documented within a change control framework with versioning, if the modification needs to be rolled back, there is an audit trail that can be reviewed to determine if there were any subsequent impacts, identifying other items requiring rollback and determining a work plan for reverting to a prior version.
Does this constitute a data governance program? Not really, but it does provide a starting point for overseeing actions that are expected to happen and provides an audit trail that can demonstrate the justifications for any data correction as well as show that the changes were made by vetted staff members. Instituting some straightforward controls over data correction should prevent arbitrary modifications performed in the absence of any supervision.
David is the President of Knowledge Integrity, Inc., a consulting and development company focusing on customized information management solutions including information quality solutions consulting, information quality training and business rules solutions. Loshin is the author of The Practitioner's Guide to Data Quality Improvement, Master Data Management, Enterprise Knowledge Management:The Data Quality Approach and Business Intelligence: The Savvy Manager's Guide. He is a frequent speaker on maximizing the value of information. David can be reached at email@example.com or at (301) 754-6350.