Answer

Are the data quality issues worse in real-time data integration apps?

Are the data quality issues worse in real-time data integration apps?

    Requires Free Membership to View

Is real-time data integration really everything people make it out to be? I hear that data quality can be a major problem for companies.

Yes, real-time data integration, or even near-real-time integration, is really a powerful approach to expanding the capabilities of a data warehouse environment. There are numerous business analysis capabilities that are only feasible with low-latency data delivery. We see a lot of data warehouse systems move into the center of a company’s business operations once they’ve implemented a near-real-time extract, transform and load (ETL) process.

I don’t believe that data cleansing and other data quality functions are any more of a problem with a near-real-time environment than they are with a traditional higher-latency change data capture environment. The challenge in cleansing and correcting data really relates to the details that are available at the time of data creation or capture.

If your data quality problems involve measurable or quantitative errors in well-defined content (such as addresses, product descriptions and location IDs), or a need to standardize records, there are numerous technologies that can be applied to address those issues while supporting a near-real-time process. Obviously, for tools supporting automated data improvement or correction, you need to define rules and logic to correct the information.

There are inevitable challenges in addressing qualitative or accuracy problems when it comes to data content. In most companies, this can’t be automated through a set of rules; it requires a manual review of the data. And the moment that an IT worker must enter into the workflow of ETL processing, the opportunity for near-real-time delivery goes out the window.

Companies that want to support business processes requiring near-real-time data typically address data quality in a series of incremental steps:

  • Limit near real-time data for business processes that don’t require perfect data. Most real-time data needs aren’t specific to an individual transaction but require access to information to support aggregate details (for example, average call duration or quantity of calls in a corporate call center).
  •  Review the ability of your operational systems to correct data content. While it may not be possible to address data quality issues data during the data capture or creation process, it’s fairly common to correct errors prior to the extract activity.
  • Deliver data to the analytics platform in near-real-time and flag it as “not inspected”. A post-load data qualification or acceptance process can occur when practical, and the flag can be modified to “inspected” at the completion of that process.

 

This was first published in September 2010

There are Comments. Add yours.

 
TIP: Want to include a code block in your comment? Use <pre> or <code> tags around the desired text. Ex: <code>insert code</code>

REGISTER or login:

Forgot Password?
By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy
Sort by: OldestNewest

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to: