Exploring data virtualization tools and technologies
A comprehensive collection of articles, videos and more, hand-picked by our editors
SearchDataManagement.com talked to Michael Linhares, a research fellow with pharmaceutical giant Pfizer, to learn about his experiences with data virtualization software. The interview took place at Composite Software's recent Data Virtualization Day event in New York.
Read the full transcript from this video below:
Pfizer's Michael Linhares talks about data virtualization software
Mark Brunelli: How do you personally define data virtualization, because you hear a lot of phrases
Michael Linhares: I define it as an approach to sharing
information and distributing it in a way that does not
require it necessarily being physically moved from a
source database to a target database. You are actually
using the information within the source database, but
having a meta-framework on top of it, to be able to
understand how to query the data effectively.
Mark Brunelli: What would you say are some of the
key benefits of it?
Michael Linhares: Really, is being able to deploy solutions
very quickly, so you have access to all the source information
that you need, and you can very rapidly combine information
together, as long as you know enough about it and understand
the relationships that the information does have, so that you
combine information very quickly and answer questions in a
rapid way. This, of course, then leads to a drop in cost. In
addition, you also have all the information in one location.
You have it in one spot already, in the sense of this virtual
world. You physically do not have it all in one location, but
you really do not care where it is, which makes it nice, too.
Mark Brunelli: Tell me about Pfizer. Everyone knows Pfizer,
but tell me how you all first got involved with data virtualization
and how are you using it today, personally?
Michael Linhares: We were in the situation of building
information factories and really looking at the cost/benefit
of how to build data marts and doing a lot of ETL, and it
was becoming very expensive, and we had a lot of
competing priorities. We were looking for additional
technologies and solutions that would allow us to build and
distribute information to other clients quickly and more cost
effectively than building extensive ETL solutions with moving
to another data mart and doing those kind of things. We were
really looking for time-to-delivery and reducing cost.
Mark Brunelli: Have you been successful?
Michael Linhares: I think we have been successful. I think, you know,
it is a balance; we have been able to deliver information quickly.
We have been able to, when people ask for new, especially
combinations, of data across multiple sources, we have been
able to deliver that quickly. We have been able to react to
situations like an integration, where you need to put a couple
disparate pieces of information together for somebody to be
able to make a decision quickly, we have been able to do that.
To me, it does feel like we have been successful. We have a
model that is sustainable. We have a model that's reusable,
and we have really seen it grow in the organization. Some of
the growth has been driven by other corporations and
software vendors using the tool as their primary integration.
Mark Brunelli: This is one of the things you spoke about today. How
can a company go about determining whether or not a particular IT
project is right for data virtualization?
Michael Linhares: That is a really good question. I think it has to
do with an understanding of the relationships between databases.
Is the data quality in the databases, the reference data, and the
actual business logic in the databases that you are trying to
combine, are they the same? When you talk about something
like you talk about a person or a customer, are you really
talking about the same thing? Can the information be joined
easily, or is it something very challenging and difficult where
a lot of transformation needs to be done? Maybe you have to do a
lot of clean up of data, then other techniques would be better,
but if the information and the source systems, the data quality
is good, and it meets your needs, then virtualization is a very
good solution. You also have to take into account latency
questions. How frequently are sources being changed? Do you need
history? Those kind of questions bring up, 'Is it useful or
not?' You have to think about all those different questions when
you are considering it.
Mark Brunelli: You mentioned data quality, and you talked about
that a little today. What kind of data quality issues did you run into
as you began this project?
Michael Linhares: What we are really running into now is, if we
have two or three different source systems that we want to
integrate the data across, and for us as a pharmaceutical company,
they all have a field called compound, but the information within that
field is different. The definition of that field is actually not
consistent. The definition of what a person is, the definition
of what a project is, these things are inconsistent across the
systems because they were built either from commercial off-the-
shelf implementation, or they were custom built, but for a
specific business need, not really reflecting on the other
business needs and realizing that the data would be extracted
and moved to an integration layer. We are having to go back and
understand where those differences are, which ones are important
that need to changed, then going back and actually having to fix
Mark Brunelli: Keeping with some of the challenges to data
virtualization, today you also talked about how you got a little
bit of pushback from database administrators. We have a lot of
database administrator readers on our site, so I was just wondering,
what do you think their concerns were there and how do you
approach that issue?
Michael Linhares: When you bring in a new tool or a new technology
or a new approach to sharing information especially, I think a lot of
times database administrators go, 'Hey, I know how to do this. I
know how to write stored procedures, I know how to create views,
materialized views, I know how to do this.' There is a little
bit of a threat of, 'You are taking something away from me that
I kind of owned for a while.' Maybe they feel a little
uncomfortable about it. The 'I am losing a piece of something
that I feel like I own.' Just communicating to them, 'No, we are
working together. This is a partnership. We are augmenting what
you are doing. We are actually doing something that is going to
take away from you stuff that you do not need to be doing.'
Almost to the level of, 'We want to try stuff and iterate on it
and do it quickly, and you have really got this long list of
things that you are working on for a production system. You do
not want to be bothered with us anyways.'
I think there is that conversation and that balance with the
DBAs, then they start to feel comfortable. I think if you have
the conversations with them, they will feel comfortable with
what you are doing, why you are doing it, how you are doing it,
and they feel comfortable, then they start supporting you even
better. One of the hardest things, as we know, is actually
getting access to databases. The DBA's are key in making that
Mark Brunelli: In other words, you could free them up to
concentrate on more important things.
Michael Linhares: Yes, more value add things. I think all of us can add
value. A lot of the stuff we do is directly related to a business problem,
and in order to solve that, you have to have business knowledge of
the information, of the data. Whereas the DBA's, to them it is
just a field, they really do not understand the information in
the context, and we are looking at the information in context.
There is a big difference, whereas they add a huge amount of
value doing more of the information changes that do not require
that. There is a little bit of a difference there.
Mark Brunelli: If you have to offer a piece of advice to someone
considering a data virtualization project, what would it be?
Michael Linhares: Really, the first thing to do is to really look at your
data within the source systems, and understand the business context
of the data in the source systems. Understand the relationships of the
data across the source systems, so that you can even consider
whether virtualization would work. If you have information in
multiple systems that have no relationship whatsoever, it is not
going to work.
Mark Brunelli: Mike, thank you so much for taking the time to speak
with me today.
Michael Linhares: All right. Thanks a lot, Mark.
Mark Brunelli: All right.