Data quality management for data warehouses

Learn how to evaluate the risks of poor data quality in data warehouses -- and how good data quality management practices help mitigate risks.

We are discussing the option of populating our data warehouse tables with less than quality data. We need to make recommendations to the (business project) sponsors. We need to balance the risk of credibility of having bad data with gaining knowledge of how bad things are by having people access the information. What would you recommend -- populate with bad data or not populate at all?
This is a little bit of a conundrum. If I understand the question correctly, you have two challenges. The first is that your team is not ready to populate the data warehouse because the quality of the data is not acceptable for the defined business objectives. The second is that you are unable to effectively articulate to the business users the degree to which the data does not meet data quality thresholds. You also have positioned the challenge in terms of "risk" -- risking credibility (presumably of the data warehouse).

By looking at these aspects, though, I suspect that there are some deeper issues involved. For example, by whose criteria are you determining the quality of the data? If the data warehouse team is making that assessment with respect to the needs of the business project sponsors, that implies that they have been engaged in defining the business data expectations, requirements, critical dimensions of data quality, measures and metrics for monitoring the quality of the data. In that case, there whould be no risk of credibility, since (1) the data loaded into the warehouse is not "owned" by the data warehouse team, but is just collected from other sources upstream and then determined to not meet the objectively defined standards, and (2), the business sponsors would be the ones who have contributed their data quality expectations, and those rules are not "owned" by the data warehouse team either. And, in that case, you do not have to answer the question about whether the data warehouse should be populated -- that should be up to the business project sponsors, since the levels of quality would already be reported to them based on their own set of criteria.

However, this is clearly not the fact, which means a few things. First, it is not clear what the definition is in this context for data quality acceptability, but it is not derived or owned by the business users. Second, it appears that there may be an opportunity to improve the data requirements analysis process in a way to capture the data quality dimensions, rules and metrics in a way that allows you to characterize the candidate data sources for suitability prior to data warehouse population time. Third, there is an opportunity to better communicate the value of high quality data deflecting the "risk" away from the data warehousing team and back to the business clients: are they willing to accept the risk of making bad decisions or not having accurate reporting due to poor upstream data quality?

More data quality management resources

More data warehouse resources

Dig Deeper on Data quality techniques and best practices