alphaspirit - Fotolia

Data virtualization ushers in unified view of data

A member-owned supply chain management company melded siloed data sets to gain a unified view of diverse data feeds.

Although data virtualization combines some of the most favored technology approaches of the day -- employing APIs and metadata to create a data integration layer that often works without actually moving data around -- it has, for the most part, remained on the periphery of the big data movement. That may change as data types expand, and users seek a unified view of that data.

Data virtualization, according to Chuck DeVries, enabled a unified view of participants at Vizient, based in Irving, Texas, a member-owned company providing supply chain management services to the healthcare industry.

"In our business, everything is about members and what access they have. To have all the information available in one place was really valuable for us," DeVries, vice president for architecture and development at Vizient, said.

Difficulty arises because various members create data in various ways, DeVries said. The company uses a data virtualization platform from Denodo Technologies to integrate these diverse data sets. A first use case involved integration for a unified view of multiple Salesforce systems -- ones that otherwise would appear as ''silos." For future projects, DeVries plans to use Denodo along with Hadoop data sources.

Third-generation data virtualization

Denodo, based in Palo Alto, Calif., along with Composite Software (now part of Cisco), has been something of a data virtualization pioneer. Data virtualization platforms are also available, often as part of larger offerings, from IBM, Informatica, Oracle, Redhat, SAP and others.

A number of different technologies converged -- enterprise mashups and data federation among them -- to form data virtualization, according to Mei Selvage, a Gartner research director.

She described the Denodo platform for data virtualization as ''a third-generation integration technology," based on its support for Hadoop data integration and Web APIs.

Data virtualization has particular value, according to Selvage, because it decouples the data consumer from the data provider.

"In effect, you create a useful data governance fabric, because the tools also provide a shared data access layer on top of the data virtualization layer. The data becomes more liberated," she said.

On the data liberation front

For Vizient's DeVries, successful data virtualization provides a framework for data liberation. Such unencumbering of data is becoming essential in healthcare.

"To service a wide range of needs, we bring information from all kinds of sources, and you need to pull the sources together in a cohesive way," he said. Conceptually, that means establishing "the right layers of abstraction, so you can get a solid view."

That has to be done in such a way that you don't need to proscribe a single interface for users. Software like Denodo's helps IT to be agile and adaptive, he suggested. DeVries described the SalesForce integration enabled by data virtualization as a first step -- one, he admitted, that is "not too terribly sexy."

Greater use of Hadoop to handle a wider variety of incoming health data will provide further instances for data virtualization, he said. In a way, working in measured steps toward such an implementation is part of DeVries' overall philosophy of development.

"You have to have an architectural view for version 2.0 just as much as for 1.0 of your system," DeVries said. Moreover, he added, "it is important to get something in front of people that they can use. It is more important than perfection."

Metadata updates

Denodo recently released a version of its flagship software that employs a dynamic query optimizer that improves performance by automatically choosing full aggregation push-down, partial aggregation push-down or on-the-fly data movement based on the characteristics of the big data sources.

According to Ravi Shankar, Denodo's chief marketing officer, Denodo Platform 6.0 also adds support for new data sources that include Amazon Redshift, Hewlett Packard Enterprise Vertica, Cloudera Impala and Apache Spark.

Shankar said the platform now provides a Web-based user interface with search capabilities for both data and metadata. Management of metadata has been cited in a Gartner survey as a relative weakness for Denodo's platform -- improved searchability can be seen as an effort to close this gap.

Next Steps

Check out a Webcast in which David Loshin explains data virtualization

Dig Deeper on Data virtualization and data federation