Problem solve Get help with specific problems with your technologies, process and projects.

Data cleansing: The business impact of dirty data

Look at the business impact of leaving less-than-ideal data in place to help you determine what data should be cleansed.

How do I even begin a data cleansing process? Do we start with our operational system or in the warehouse? And is there a way to determine (strategically) what data is worth being cleansed so we can save time and resources? (Is there some dirty data that won't affect us?)
It is always best to address data quality as early in the cycle as you can. Only when the operational system cannot be modified should data quality efforts be pushed to the data warehouse.

Please remember you don't necessarily "cleanse" all data that is determined to be less than par quality. Most of...

the time, you simply report on the data exception and find out your expected bounds are too tight or the exception was truly a business exception.

The best way I've found to determine strategically what data should be cleansed is to look at the business impact of not recognizing the less-than-ideal data quality and leaving it in place. If that business impact is greater than the effort to raise the awareness of the exception or cleanse the data, then the data certainly should be cleansed. Certainly there is some dirty data that will affect the business to a lesser degree than would the cost of fixing it. I have found that most data warehouse programs should take the important first step towards investigating data quality and establishing a data quality program: creating a framework for addressing quality violations.

Dig Deeper on Data quality techniques and best practices

Have a question for an expert?

Please add a title for your question

Get answers from a TechTarget expert on whatever's puzzling you.

You will be able to add details on the next page.

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.