EXPERT RESPONSE
It is always best to address data quality as early in the cycle as you can. Only when the operational system cannot be modified should data quality efforts be pushed to the data warehouse.
Please remember you don't necessarily "cleanse" all data that is determined to be less than par quality. Most of the time, you simply report on the data exception and find out your expected bounds are too tight or the exception was truly a business exception.
The best way I've found to determine strategically what data should be cleansed is to look at the business impact of not recognizing the less-than-ideal data quality and leaving it in place. If that business impact is greater than the effort to raise the awareness of the exception or cleanse the data, then the data certainly should be cleansed. Certainly there is some dirty data that will affect the business to a lesser degree than would the cost of fixing it. I have found that most data warehouse programs should take the important first step towards investigating data quality and establishing a data quality program: creating a framework for addressing quality violations.
|