Q
Problem solve Get help with specific problems with your technologies, process and projects.

How to estimate customer data cleansing costs

How much on average does it cost to clean one customer record? Do you know if there been any specific reporting or analysis done on this area of customer data quality?

How much on average does it cost to clean one customer record? How much should an organization spend on customer data cleansing? Do you know if there been any specific reporting or analysis done on this area of customer data quality?

The challenge with this question is that underlying its simplicity lie many latent questions whose answers are...

needed before any kind of customer data cleansing costs analysis can be considered. For example, what data elements constitute the customer record? How many records are there? What are the criteria for declaring a record "clean"? What types of customer data are there? Individuals or organizations? How old are the records? Are they in a single table or scattered across many data assets? What approaches are to be taken for cleansing? There may be studies performed by vendors on the average cost, but I suspect that beneath this question lurks a number of other more important ones.

To start thinking about the cost of cleansing, consider this example, with residential customer data consisting of first name, last name and telephone number. One can determine if a single record is "correct" using this algorithm: Call the telephone number, ask to speak with the person whose name shares the record with the telephone number. If the person comes to the phone, ask if all the values are accurate, and correct those that are not. If there is no one there by that name, the record is incorrect. However, at this point what can be done to correct it? Either the name is not correct or the number is not correct. The next step in cleaning requires additional information, and if none is available, then the algorithm ends.

Simplistic? Yes. Accurate? Yes. Cost effective? Depends on the number of records, staff members and telephones. Scalable? Not really. There are alternatives, but reliance on different approaches start to impact those key considerations. Automated solutions may be more scalable, more costly, less accurate, more complex, require more expertise, etc.

It may be better to challenge the question, then, and turn it into a different sort of beast by suggesting that these questions be answered first and then look at the different alternatives and their corresponding costs:

  • What business processes are impacted by "unclean" customer data?
  • How is "clean" customer data defined?
  • What business benefits can be achieved by cleaning customer data?
  • What level of precision is necessary for those benefits to be achieved?
  • The level of effort that is reasonable to spend on customer data cleansing must be less than the value of the accrued business benefits, and this provides an upper limit to what could be budgeted for the process.

    This was last published in February 2009

    Dig Deeper on Data quality techniques and best practices

    PRO+

    Content

    Find more PRO+ content and other member only offers, here.

    Have a question for an expert?

    Please add a title for your question

    Get answers from a TechTarget expert on whatever's puzzling you.

    You will be able to add details on the next page.

    Join the conversation

    1 comment

    Send me notifications when other members comment.

    By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

    Please create a username to comment.

    One thing to think about is the impact to the enterprise that bad data has

    https://www.data.com/export/sites/data/common/assets/pdf/DS_Gartner.pdf

    and then try to figure out the cost of cleaning that data. In general outside data cleansing companies cost too much and bring data security risks from what I've seen with poor results when you look close.

    The best solutions in my mind are IBMs quality stage/ascential/infosphere https://www.ibm.com
    data flux from SAS is a bit more concentrated in the space with some nice interfaces
    https://www.dataflux.com
    and data ladder which is more for business users as opposed to IT
    https://www.dataladder.com
    Cancel

    -ADS BY GOOGLE

    SearchBusinessAnalytics

    SearchAWS

    SearchContentManagement

    SearchOracle

    SearchSAP

    SearchSQLServer

    Close