Data quality management for business intelligence projects

Data quality management for business intelligence projects

Plenty of business intelligence or data warehouse projects have been blindsided by complications related to data quality. Sometimes these issues aren't apparent until business users start testing the system just before going live with the project. So what causes business intelligence project teams to get caught off guard by data quality issues? And why do these data quality management problems surface so late in the project? 

There are two common pitfalls: defining data quality too narrowly and assuming data quality is the responsibility of the source systems.

People often assume that data quality simply means eliminating bad data -- data that is missing, inaccurate or incorrect. Bad data is certainly a problem, but it isn't the only problem. Good data quality programs also ensure that data is comprehensive, consistent, relevant and timely.

    Requires Free Membership to View

    When you register, you'll begin receiving targeted emails from my team of award-winning writers. Our goal is to keep you informed on the hottest data and information management trends today.

    Hannah Smalltree, Editorial Director

    By submitting your registration information to SearchDataManagement.com you agree to receive email communications from TechTarget and TechTarget partners. We encourage you to read our Privacy Policy which contains important disclosures about how we collect and use your registration and other information. If you reside outside of the United States, by submitting this registration information you consent to having your personal data transferred to and processed in the United States. Your use of SearchDataManagement.com is governed by our Terms of Use. You may contact us at webmaster@TechTarget.com.

Rick Sherman, Athena IT Solutions

Don't blame the source systems
Defining data quality too narrowly often leads people to assume that source transactional systems -- either through data entry or systemic errors -- cause the bad data. Although they may be a source of some errors, the more likely culprits are either inconsistent dimensions across source systems (such as customer or product identifiers) or inconsistent definitions for derived data across organizations. Conforming dimensions -- developing consistent customer or product identifiers -- is important for accessing and analyzing data for a company. The source systems do not own the data quality issues across other systems, the business intelligence project team does. The source systems need to ensure that the data within their data silo is correct. But the BI project team is responsible for providing the business with data that is consistent across the enterprise.

Similarly, each organization within the enterprise may have valid business reasons to derive data differently than others. For example, their position in a set of business processes may determine how they view their data. The individual organizations aren't tasked with developing common definitions for derived data, but the business intelligence project team is. Many business intelligence project teams try to claim that data quality issues aren't their responsibility. However, from a practical viewpoint, the BI team does need to make these issues their own, since their job is to ensure the highest data quality possible. The BI project team is packaging the data for consumption by business users and they will be held accountable for the data quality. This may not seem fair, but the success of their project depends on it.

Don't shortchange the pilot
Surprises happen when the project does an initial pilot or release involving only a small subset of source systems. While there may be many good reasons to have a narrow scope for a pilot, you won't get an appreciation for the effort necessary to conform these dimensions as the number of source systems expands. 

Sometimes pilots are only with a single organization, using only their definitions for derived data. Once again, the tough issue is often how to accommodate the differences in the derivation definitions between organizations. In both cases the real challenges are encountered when dealing with multiple systems and organizations. The business users need to look at the big picture, and that is only possible when they can access and analyze data across the enterprise.

Steps to address data quality
To ensure data quality, the business intelligence project team has to address it from the very beginning. Here are several significant steps to consider:

  1. Require the business to define data quality in a broad sense, establish metrics to monitor and measure it, and determine what should be done if the data fails to meet these metrics.
  2. Undertake a comprehensive data profiling effort when performing a source systems analysis. Data anomalies across source systems and time (historical data does not always age well!) is needed so that the team can address them with the business early on.
  3. Incorporate data quality into all data integration and business intelligence processes from data sourcing to information consumption by the business user. Data quality issues need to be detected as early in the processes as possible and dealt with as defined in the business requirements.

Enterprises must present data that meets very stringent data quality levels, especially in light of recent compliance regulations and demands. The level of data transparency needed can only result from establishing a strong commitment to data quality and building the processes to ensure it.

More data quality management resources

About the author
Rick Sherman is the founder of Athena IT Solutions, a Boston-based consulting firm that provides data warehouse and business intelligence consulting, training and vendor services. In addition to over 20 years in the business, Sherman is also a published author of more than 50 articles, an industry speaker, a DM Review World Class Solution Awards judge and a data management expert at SearchDataManagement.com. Sherman can be found blogging at The Data Doghouse and can be reached at rsherman@athena-solutions.com.


This was first published in July 2005

Join the conversationComment

Share
Comments

    Results

    Contribute to the conversation

    All fields are required. Comments will appear at the bottom of the article.

    Disclaimer: Our Tips Exchange is a forum for you to share technical advice and expertise with your peers and to learn from other enterprise IT professionals. TechTarget provides the infrastructure to facilitate this sharing of information. However, we cannot guarantee the accuracy or validity of the material submitted. You agree that your use of the Ask The Expert services and your reliance on any questions, answers, information or other materials received through this Web site is at your own risk.