While implementations are still few and far between, interest in open source data integration technology is slowly but surely growing as a weak economy continues to put pressure on IT budgets.
According to a recent survey by Gartner, 11% of organizations in the market for data integration technology have evaluated open source tools along with commercial products. That's nearly double the percentage that said they considered open source data integration in a 2008 survey.
Still, implementations of open source data integration tools are rare compared with commercial deployments, according to Ted Friedman, an analyst with Stamford, Conn.-based Gartner, who analyzed the survey's findings. The two leading open source data integration vendors – Talend and Pentaho -- count only several hundred paying customers between them, Friedman estimates.
Of those, most -- 83%, according to the survey -- use open source data integration tools for ETL functions supporting business intelligence (BI) and data warehousing projects. That's because most open source data integration tools have "less mature capabilities for supporting" other styles of data integration, including data federation and real-time data integration techniques like change data capture, Friedman said.
Children's Hospital of Omaha has been using open source data integration from Talend since earlier this year. Reflecting the survey's findings, however, the hospital uses the open source ETL tools mainly to integrate data from disparate databases and an electronic medical records system to a Sybase IQ data warehouse for billing and other analysis.
For a number of non-ETL jobs, like integrating data to and from clinical lab systems, the hospital uses an eLink Systems interface engine that transmits data in the HL7 message standard.
"There's some limitations Talend has, but for the money and our use cases, it really fit the bill," said Wendy Worthing, manager of data services and technical training at Children's Hospital.
Friedman expects those limitations to be addressed by vendors like Talend over the next two to four years, making open source data integration an attractive, lower-cost option for some companies.
"Open source tools in this market are starting to really see some traction," Friedman said. "And I do think [that] as these tools grow in maturity, you're really going to see more balance in their uses."
One new use case for open source data integration tools could be master data management (MDM). Talend, based in Los Altos, Calif., announced on Monday that it had acquired the MDM-related assets, including an MDM repository technology, of Amalto Technologies. Amalto specializes in business-to-business exchanges.
Talend had in place both the data quality and data integration capabilities needed to support MDM implementations, but it lacked the actual MDM repository, said Yves de Montcheuil, vice president of market strategy for Talend. With the acquisition, he said, companies, especially small and midmarket firms, will have the ability to restart stalled MDM projects at a fraction of the cost.
Philip Russom, senior manager of research and services at the Renton, Wash.-based Data Warehousing Institute, said Talend may have hit on a winning combination, one that could help propel the open source data integration market.
"Users go through a maturation process where they mature or move from data integration to data quality," Russom said. "And after they've done data quality for a while, they mature and move on to something else, and quite often that something else is master data management."
Friedman said: "Sometimes synchronization of master data across systems can very suitably be done in a bulk and batch fashion. So absolutely I think the [open source data integration] tools are relevant to the MDM domain."
For now, companies in the market for data integration tools should consider open source alternatives when their needs are largely for bulk and batch loading, Friedman said. But he cautioned that open source data integration, while less expensive than commercial products, comes with its own costs.
For example, companies using open source data integration software will have to perform much of the customization with other data management technologies themselves, as most open source vendors lack robust partner networks. There is also a dearth of skills around open source data integration in the marketplace.
And though ETL represents a significant percentage of most organizations' data integration needs, there are others that open source technology just can't meet at this point, according to Friedman.
"ETL is becoming just one style of many that organizations need to support," he said.