Business intelligence (BI) systems and their supporting data warehouses are only as good as the data that goes...
into them. And if you aren’t properly handling the BI data integration process, your end users -- and ultimately, your organization -- may be in for trouble.
With BI tools becoming more and more pervasive in organizations, and more critical to the success of business operations, making sure that you have a well-designed and well-executed process for integrating BI data is of paramount importance, according to data management analysts such as Ted Friedman of Gartner Inc.
Friedman said Gartner sees data integration challenges related to BI as a drag on the success of BI and analytics initiatives -- and a big reason for outright project failures.
“As the data that organizations are trying to harness gets more and more complex, with more kinds and sources of data and now ‘big data’ thrown into the mix, a significant amount of time and effort is involved in matching, cleaning and preparing data for BI applications,” he said. “It’s a darned hard problem, particularly when you add in older, legacy systems where you sometimes need to do archaeology first in order to interpret the data.”
Another complicating factor is that things are changing in the world of data integration technology as business users demand faster access to BI data.
ETL still best bet for BI data integration?
The traditional workhorse technology for managing BI data integration is extract, transform and load (ETL) software that pulls data from source systems in bulk batch processing jobs. Friedman said newer data integration techniques offer lower latency than ETL tools do. For example, change data capture software and other real-time data integration tools let you push new or modified information to data warehouse and BI systems in real or near real time, which can be particularly useful for tasks like fraud detection. “It is streaming [data] in granular form rather than big chunks in batch, which is what ETL is using,” he said.
More on managing data integration for BI
Listen to a podcast Q&A with consultant William McKnight on developing a sustainable BI data integration plan
Get advice on meeting the increasing demands from business users for faster delivery of BI data
Read about the central role that data integration played in an ERP BI project at Hercules Tire & Rubber
Another option: federated and virtualized approaches to data integration and delivery that don’t move the data out of source systems at all but instead create consolidated views of data from multiple sources for BI uses. With data virtualization tools, the integrated data “doesn’t persist anywhere,” Friedman said. “You’re grabbing it in real time and joining it together and making it seem as if it is one database somewhere to the applications using it.”
Despite the emergence of this new wave of data integration and delivery tools, though, Friedman thinks it would be a mistake to view ETL software as obsolete or no longer valuable. “ETL is still relevant,” he said. “We think there will always be a role for ETL-style processing because not all data can or should be delivered in real time.”
Indeed, Friedman warned that data integration vendors are pushing “sexy” real-time options for BI data integration when many organizations can still get what they need from a batch approach. “Real-time [integration] costs money and it requires a change from what organizations have been doing, so there needs to be a strong business case for it,” he said.
“ETL still has a role -- it is the heavy lifter of data integration,” agreed Claudia Imhoff, president of Intelligent Solutions Inc., a consultancy in Boulder, Colo. Still, she noted that its newer competitors can be more flexible and faster to deploy and are better suited to delivering timely data to business users for operational BI applications.
Real time not always right but more of a reality
Although he acknowledges that real-time data integration for BI is frequently neither necessary nor desirable, Barry Devlin, founder of 9sight Consulting in Cape Town, South Africa, points out that BI and analytics applications are increasingly moving in that direction. “I think it is a really interesting time in terms of how this will pan out,” he said.
As an example of an experimental use case, Devlin cited the U.S. insurance industry, where real-time data from cars -- braking and speed data, time spent driving and other information -- is being transmitted to business users at insurance companies through mobile phone networks, enabling the insurers to modify premiums or even provide rebates on the fly.
As Friedman noted, the increasing focus on capturing and analyzing big data, including Web server logs, social media data and other forms of unstructured information, adds another layer of complexity to the BI data integration process within many organizations.
James Kobielus, until recently an analyst at Forrester Research Inc., said while he was still working there that unstructured data “can be as critical as structured data to what you’re doing” in BI and analytics. Even companies that are still planning or just beginning to implement big data analytics programs should look ahead and make sure they’re prepared for the data integration challenges ahead, added Kobielus, who has since taken a job at IBM. “You need to be ready,” he said, “for things like massive data inputs from social media and start to budget and staff up.”
Alan R. Earls is a Boston-area freelance writer focused on business and technology.