This article is part of an Essential Guide, our editor-selected collection of our best articles, videos and other content on this topic. Explore more in this guide:
2. - Data virtualization in the field: Read more in this section
- Benefits of data virtualization technology: Improved decision making
- Pfizer finds answer to data integration issues in virtualization tools
- Data virtualization software expedites chip development for Qualcomm
Explore other sections in this guide:
Pfizer's Pat Saucier uses the words information and data carefully, pointing out that they have two distinct meanings. Data includes unorganized factoids in need of processing. Information is processed data presented in a useful context. For example, a student's score on a test is an individual data point. But the class's average score provides the school principal with useful information on how the student body is performing as a whole.
"Data people would think about databases and data structure," he said. "Information is that data that is used to make informed business decisions."
The distinction between information and data is just the kind of thing one might expect Saucier to think about. As the director of solution strategy and information architecture for the pharmaceutical giant's product research and development division, Saucier spends a great deal of time using technology to turn streams of data from various sources into useful information that ultimately helps to shape the company's future.
One initiative that Saucier is particularly proud of involves his team's efforts to use data virtualization technology to quicken the pace of information delivery to Pfizer's researchers.
The project, which began in 2010, came about because researchers needed quick and easy access to data that could be used for informed decision making, Saucier said. The catch was that his team wanted to accomplish the goal without incurring the costs associated with lengthy development times and additional servers, storage and databases.
The answer was data virtualization software, which creates an abstraction, or middleware, layer within IT architectures that pulls data from disparate sources, combines it with other data as necessary, and delivers it to informational dashboards or business intelligence reports. Pfizer is currently running data virtualization software from Composite Software Inc., a San Mateo, Calif.-based company recently acquired by networking giant Cisco. Additional providers of data integration and data virtualization software include Informatica Corp., Denodo Technologies Inc. and Red Hat.
Among other things, Saucier's team is using Composite to create a hybrid data warehouse and support Pfizer's "cross-functional data standards" initiative -- a plan to ensure that data is delivered in a pre-determined, standardized fashion and used consistently across the company's research and medical units.
For example, the company created a product data standard to ensure uniformity in product information. But, as Saucier puts it, consistent product data is just one piece of the puzzle when it comes to informed decision making. That data has to be added to other data elements in the organization to begin to get a full picture. That's where data virtualization comes in.
More on data virtualization technology
Learn more about the career of Pfizer's Pat Saucier
Find out more about Pfizer's use of data virtualization software
Get Pfizer's take on extract, transform and load tools
"Historically, we would have had to build a massive warehouse or MDM [master data management] solution and tried to define all of the uses of data to give a complete picture," he said. "But using the hybrid model and Composite we [deliver] the data through Composite and join it to other enterprise data."
Another benefit of data virtualization software at Pfizer has been speedier data mart development times and lower hardware costs. But Saucier cautions that -- because data virtualization accesses data from varied sources -- ensuring quick data retrieval and information delivery can be a challenge. One of the best ways to improve the performance of a data virtualization platform is to come up with solid approaches to aggregating data and to use in-memory caching technology. It's also important to understand which data -- and how much -- to store in-memory.
"If we're sourcing from a warehouse or a mart or an application that refreshes once per day, having instant access to that data defeats the purpose," Saucier said. "So you have to understand the freshness of the data coming into a virtualization platform and then make the decision. Do we cache? Do we snapshot? Or do we do some other thing?"
Advice for newcomers
Saucier has simple advice for anyone considering a career in information management.
"Touch the data to understand how data [moves]," he said, "not only within systems, but use real-world experience to determine how data and information flow."
That hands-on approach will also help newcomers understand how individual data points or elements relate to one another and how those relationships translate to the real world.
"Get your hands dirty. Get in the data. Run queries," Saucier said. "As you do that, make assumptions, and then the picture will become clearer. And then you can make informed decisions of where to get the data and how it should be organized."