The challenge with this question is that underlying its simplicity lie many latent questions whose answers are needed before any kind of customer data cleansing costs analysis can be considered. For example, what data elements constitute the customer record? How many records are there? What are the criteria for declaring a record "clean"? What types of customer data are there? Individuals or organizations? How old are the records? Are they in a single table or scattered across many data assets? What approaches are to be taken for cleansing? There may be studies performed by vendors on the average cost, but I suspect that beneath this question lurks a number of other more important ones.
To start thinking about the cost of cleansing, consider this example, with residential customer data consisting of first name, last name and telephone number. One can determine if a single record is "correct" using this algorithm: Call the telephone number, ask to speak with the person whose name shares the record with the telephone number. If the person comes to the phone, ask if all the values are accurate, and correct those that are not. If there is no one there by that name, the record is incorrect. However, at this point what can be done to correct it? Either the name is not correct or the number is not correct. The next step in cleaning requires additional information, and if none is available, then the algorithm ends.
Simplistic? Yes. Accurate? Yes. Cost effective? Depends on the number of records, staff members and telephones. Scalable? Not really. There are alternatives, but reliance on different approaches start to impact those key considerations. Automated solutions may be more scalable, more costly, less accurate, more complex, require more expertise, etc.
It may be better to challenge the question, then, and turn it into a different sort of beast by suggesting that these questions be answered first and then look at the different alternatives and their corresponding costs:
The level of effort that is reasonable to spend on customer data cleansing must be less than the value of the accrued business benefits, and this provides an upper limit to what could be budgeted for the process.
Dig Deeper on Data quality techniques and best practices
Related Q&A from David Loshin
Fact tables and dimension tables are used together in star schemas to support data analytics applications. But they play different roles and hold ... Continue Reading
Learn how to get senior management to buy into data governance. Get tips on selling data governance policies and processes to executives who can ... Continue Reading
What rights do funders have over data ownership? Get an expert's take on this data ownership issue. The answer might surprise you. Continue Reading