For years, industry analysts, vendors and users alike viewed data warehousing -- with its promise of a unified view of an organization's most crucial data -- as the holy grail of business intelligence
"[Unfortunately] it just hasn't played out that way," said Forrester Research's Jim Kobielus.
Data warehousing's failure to become the be-all and end-all of BI data sources has left an opening for an alternative technology, data federation, which eschews warehousing's disparate data in a single location for a more direct approach, analysts said.
Also called data virtualization, enterprise information integration (EII) and information-as-a-service, data federation technologies collect data straight from source systems for analysis, complementing less agile data warehouses that rely on preset batch load methods and providing users with more flexibility in the type and timeliness of data available for BI and analytics.
For all its benefits – including the ability to aggregate, model and cleanse data from across the enterprise optimized for analytics – data warehousing has not resulted in the often coveted "single version of the truth," analysts agree.
That's because many organizations, especially those with decentralized corporate structures, maintain multiple departmental data warehouses, never unifying the data in a comprehensive enterprise data warehouse, Kobielus said. In some cases there is no compelling business need to deploy an enterprise data warehouse, he said, while in other cases, bureaucratic and political obstacles stand in the way.
Still, some other organizations, especially financial firms and healthcare providers, mandate that certain types of sensitive data remain in a single, secure database, preventing it from being integrated into a warehouse, said Wendy Tam, product marketing manager for IBM's InfoSphere Federation Server.
An 'extender of your data warehouse'
As Kobielus explains, data federation technology helps solve these problems by making a single call to multiple data sources, then integrating the data in a middleware layer where it is standardized and cleansed. From there, it sends the data on to BI applications for analysis, eliminating the need for a permanent, physical relational database.
Data federation technology can be used to collect data from multiple departmental data warehouses, but at a lower cost than a complex enterprise data warehouse. It can also access data in operational systems to complement data warehouse-stored data. In most cases, users can customize their queries, allowing analysis of data housed in any number of combinations of data sources.
"It's essentially an extender of your data warehouse view," Kobielus said. "Data federation adds fields or subject domain or attributes that are not supported in the views available by your data warehouse."
Taikang Life Insurance uses the IBM data federation server, for example, to share data across business units for analysis rather than aggregating it in a single enterprise data warehouse, Tam said. The company, based in Beijing, uses the technology to analyze customer data housed in otherwise siloed departmental data sources, including IBM's DB2 and Informix databases, as well as Oracle data sources and applications.
IBM's data federation technology can also access unstructured data, including email and XML data, Tam said. And it does its work – collecting data from disparate sources, then aggregating it in a virtual database for analysis -- behind the scenes, she said. To the end user, the results look as if they've come from a traditional data warehouse.
Data federation technology complementary, not disruptive
Data federation technology is not new. It's been around for well over a decade, said Ted Friedman, analyst with Stamford, Conn.-based Gartner Inc. It has seen an uptick in adoption, however slight, in the last five years or so as the promise of enterprise data warehouses has gone unfulfilled owing to fragmented data architectures.
Kobielus also cited the growing popularity of cloud computing as another driver of data federation technologies, "because cloud computing is so decentralized." Data housed in data sources in the cloud can be difficult to integrate into traditional data warehouses, he said, making federation techniques a more attractive option.
Robert Eve, vice president of marketing at Composite Software, which sells data federation software under the data "virtualization" moniker, predicts that user adoption of federation technology will increase as companies look for less expensive alternatives to enterprise data warehouses during the recession, a point echoed by IBM's Tam.
"Federation technology has two real benefits, a business benefit and a technical one," she said. "It increases visibility into lines of business, and it can help reduce costs and complexity by reducing the need for an extra, expensive [relational] database server." It also saves users time, she said, by removing the need to query multiple data sources manually.
That doesn't mean, however, that data federation technology, which has its own drawbacks, can take the place of data warehouses altogether, the analysts and vendors said. In a 2008 report, Kobielus said data federation is not optimal for certain types of queries, like "large dimensional table scans or time-series analyses against historical data sets," which are better handled by data warehouses.
Data federation technology also takes a toll on the performance of the operational and transactional data sources it queries, as it consumes significant CPU, he said. In addition, data federation requires transactional data sources to be "always on," as opposed to data warehouses that batch load data at preset times, often in the evenings, allowing data sources to go offline for the night.
Instead, data federation technology will ideally complement, not replace, existing data warehouses at most organizations, Kobielus and Friedman agreed, allowing BI end users greater flexibility.
"It's not a revolution here where federation is replacing data warehouses," Friedman said. "In most cases, organizations are going to need a portfolio of [data integration] capabilities," including both warehousing and federation technology.