Is there anything more frustrating -- or useless -- than out-of-date data? Ask any corporate-level decision maker...
and odds are the answer will be no.
Companies are increasingly turning from traditional batch-oriented techniques to real-time data integration to eliminate the scourge of out-of-date data. Real-time data integration can be achieved through a variety of methods, but the goal is the same: to communicate accurate, timely data from point A to point B in real time so users can make better-informed business-critical decisions. Experts agree that real-time data integration is gaining popularity but also warn that it is not a methodology to adopt lightly.
"Recognize that the world is not a black-and-white place," said Ted Friedman, vice president and distinguished analyst with Stamford, Conn.-based Gartner Inc. "Any given company is going to have data integration requirements that span the latency spectrum. There are going to be pieces that are best suited to be delivered in a high-latency, batch-oriented mode, and there [are] going to be other things where real-time data integration really does have value."
Real-time data integration options
The most common real-time data integration method is change data capture (CDC), also called data replication. CDC tools and technologies recognize when an important change has occurred in one data source and, in real time, transmit the change to a given target.
Bloor Research's Philip Howard explains: "As a change is made to a database record in your transactional system, for instance, it's also actively captured and fed through to your data warehouse or business intelligence system, or whatever you've got running, so it's ready to answer real-time queries."
CDC is used most often to synchronize operational applications and for real-time business intelligence (BI) purposes, according to Gartner's Friedman. Indeed, business intelligence is a major driver of real-time data integration adoption, he said, especially among businesses that require BI reports at a moment's notice.
"If you've got some type of short-cycle business and you need up-to-the-second analysis of how your supply chain is performing, [for example]" Friedman said, "then you need to be delivering data from some data sources to your BI application in more of a real-time fashion."
CDC is less ideal, however, if the goal is a comprehensive, real-time view of a single entity via data housed in multiple sources. For that, users more often turn to data federation, sometimes called enterprise information integration or data virtualization.
"Data federation is better suited to people that are looking … at a more narrow slice of the data landscape," Friedman said. "They want to get a complete view of a single instance of an entity -- a customer, a product, an employee -- as opposed to somebody who's doing historical trending in the data warehouse."
For example, an insurance agent on a customer call might use an application supported by data federation technology to search multiple data sources to obtain a comprehensive view of that customer while still on the call. "That [data] needs to be in real-time," Friedman said.
Both the CDC and data federation markets are well established, Howard said, having already gone through the consolidation phase "that you tend to get once products start to mature." Large vendors like IBM -- which acquired data integration specialist DataMirror last year -- and Oracle -- which scooped up Sunopsis in 2006 -- as well as smaller players like Teradata and GoldenGate Software, offer a variety of solid CDC and data federation real-time data integration tools, he said.
Friedman also identified a third approach, what he calls the messaging-middleware method, in which real-time data integration is achieved through middleware technologies that connect applications.
"Think of IBM WebSphere MQ and Microsoft BizTalk Server, and products like that, that are really meant to do granular, message-oriented propagation of data," Friedman said. "An application on one end spits out a message of something meaningful that happened, and these technologies propagate that message to another system or application in a low-latency fashion. So it's sort of like the data replication idea, but working at the application layer as opposed to the database layer."
This middleware approach is ideal for inter-enterprise scenarios, when there's a need for real-time data integration among organizations that may not have access to one another's data sources, Friedman said. A vendor might communicate an important data change to a supplier in real time using this method, for instance.
Data quality raises real-time data integration concerns
Both Howard and Friedman noted, however, that while there are many benefits to real-time data integration, there are numerous drawbacks as well -- first among them, poor data quality. In more traditional, batch-oriented data integration processes, there is ample time to scrub and cleanse data before it reaches its destination. Not so with real-time data integration, regardless of the method.
"In the middle of that process [batch-oriented data integration], you've got a chance to actually analyze and cleanse that data," Friedman said. "In the world of real-time data integration, there's less opportunity to apply very sophisticated files for analyzing the quality and cleansing the data." There is a higher risk, then, that data integrated in real time will be of poorer quality, incorrect or misleading.
Friedman said current real-time data integration tools are better at data transformation and cleansing than they've been in the past, but there is still plenty of room for improvement. It is possible that someday near-perfect real-time data integration quality could be achieved, he said, as the problem is more technological than conceptual.
Both analysts said it is also important to recognize that real-time data integration isn't ideal for all companies and organizations, and in some cases may even prove detrimental. Friedman advises users to match their data integration method to their latency requirements. An organization that routinely analyzes certain data sets on a weekly basis, for example, would in that case have no need for real-time data integration, which could actually cause more harm than good, partly because of the already mentioned data quality concerns.
Organizational structure and corporate politics also play a role in determining the appropriateness of real-time data integration, Friedman said. If users aren't ready to accept and use real-time data, there's little point in integrating data in real time in the first place.
"Frankly, I know some companies that if they had real-time BI it wouldn't matter at all because the way they make decisions, the culture and the politics of the organization are not set up for them to act on real-time information," Friedman said. "I think that's a limiting factor for many organizations today."
Howard agreed, pointing to what he called decision-making latency.
"How soon can you as a human being make a decision based on new information that you're given? If you have to have a meeting with five other people and it takes two days to arrange that, or even two hours to arrange that, then you don't need real-time [data integration]," Howard said.
He added: "If you can make a decision instantly – 'Ah, this has happened, therefore I know to do such-and-such' – then that's where real-time decision making becomes important."