Data virtualization is an umbrella term used to describe an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how the data is formatted or where it is physically located. The goal of data virtualization is to create a single representation of data from multiple, disparate sources without having to copy or move the data.
Data virtualization software aggregates structured and unstructured data sources for virtual viewing through a dashboard or visualization tool. The software allows metadata about the data to be discoverable, but hides the complexities associated with accessing disparate data types from different sources. It is important to note that data virtualization does not replicate data from source systems; it simply stores metadata and integration logic for viewing. Vendors who specialize in this type of software include IBM, SAP, Denodo Technologies, Oracle, TIBCO Software, Microsoft and Red Hat.
How data virtualization works
Essentially, data virtualization software is middleware that allows data stored in different types of data models to be integrated virtually. This type of platform allows authorized consumers to access an organization’s entire range of data from a single point of access without knowing (or caring) whether the data resides in a glass house mainframe, on premises in a data warehouse or in a data lake in the cloud.
Because data virtualization software platforms view data sources in such an agnostic manner, they have a wide range of use cases. For example, the centralized management aspect can be used to support data governance initiatives or make it easier to test and deploy data-driven business analytics apps.
Data virtualization software can also play a role in managing who is able to access certain data sources and who is not. Perhaps one of the most important reasons for deploying data virtualization software, however, is to support business objectives that require stakeholders to view a single source of truth (SSOT) in the most cost-efficient manner possible.
Data virtualization vs. data federation
Some vendors use the labels data virtualization and data federation interchangeably. To developers, however, they can mean slightly different things. The goal of data federation technology is to aggregate heterogeneous data from disparate sources and view it in a consistent manner from a single point of access. The term data virtualization, however, simply means that the technical information about the data has been hidden.