Master data management (MDM) and big data initiatives go together like peanut butter and jelly -- or at least they should -- according to Aaron Zornes, the outspoken founder and chief research officer with the MDM Institute, a consulting firm focused exclusively on MDM and related matters.
Zornes said the intersection of MDM and big data is a topic that is sure to be on the minds of IT professionals attending next week's 7th Annual MDM and Data Governance Summit, a conference he is hosting in New York City.
SearchDataManagement.com got a hold of Zornes ahead of the summit to find out just how big data and MDM initiatives might interact in the real world. Below are some excerpts from that conversation, which have been edited for length and clarity. And be sure to visit the SearchDataManagement.com news page next week for more coverage of the MDM and Data Governance Summit.
What is the connection between MDM and big data?
Aaron Zornes: It's like a two-way street, the interaction between the two worlds of big data and MDM. MDM needs big data to fill in its blind spots. MDM people think that they have a 360-degree view of customer, citizen, supplier or whatever [entity]. But in reality they've got about a 45% to75% view because there is a lot of stuff out there on the deep Web, and not just LinkedIn and Facebook. There are a lot of other publicly available records that could radically fill out your view of the customer so that you can start inching up toward [a more complete] view of the customer or supplier or party of interest.
That explains how the goals of MDM can be furthered with big data. But you also said it's a two-way street. How might a big data initiative benefit from MDM?
Zornes: You've got to have that trusted source first to map against all of that data. And likewise, MDM needs that big data to help round out its view. To really take advantage of big data, it has to be managed, right? In order to manage something, you have to identify it and measure it. For example, if you’re a salesperson, you want to know the politics and the sports affiliations of the people you're selling to so you don't misspeak. All of that stuff is out there through LinkedIn and through Facebook. But you've got to make sure you've got the right person because there are a lot of avatars, a lot of spoofing and it's just hard. How do you know that John Aaron Zornes Jr. -- which is me -- is Aaron Zornes? [Transportation Security Administration officers] make me use John as my first name, but everybody else knows me as Aaron. [You've got] this challenge of who is who in the world of big data because there is so much garbage out there and you need to be able to sift through it and positively identify them. That's where the matching capabilities of master data management are critical because they're already finely tuned to determine if this is the right person.
MDM needs big data to fill in its blind spots.
Aaron Zornes, founder, MDM Institute
What would a set-up that combines MDM with big data management look like from a technical perspective?
Zornes: Let's say that your MDM hub is a trusted source and that you have got it up and running. You have a trusted source and cleaned-up data so you know [for example] that these are good customers, these are good suppliers, these are good prospects. Here are some of the products in the marketplace, and here are competitors and important key influencers and people you want to track. Somehow all of that data coming into your big data/data warehouse or your Hadoop implementation --whatever you want to call it -- has to be tagged and identified. Questions you're trying to answer include: Who is saying that? Who is doing that if it's phone activity or if it's ATM activity? Who is doing it if it's purchases across multiple websites? How do you rationalize it down to a given party so you know who that person is and you know the amount of influence they have? You didn't have the capability [to answer those questions] before if you just had raw big data coming in. You've got to be able to clean it and identify it and further enrich it. [That's what] trusted sources of information let you do. That's where the [matching] algorithms and the trusted source of MDM do their work to enrich and aggregate and clean the big data up so that it's meaningful to your big data analytics. Then, likewise, the stuff you pick out of the flood of big data helps to enrich the view of customer, product, supplier, bad guy, good guy or whatever.
What advice do you have for organizations as they start to think about the connection between MDM and big data?
Zornes: There are commercially available parties like Acxiom and Dunn & Bradstreet that have already created the application programming interfaces (APIs) to mine LinkedIn and Facebook. There's already software out there that goes into those public databases using the APIs and likewise goes out to Ancestry.com and other sites. Amazon also has some public stuff that you can access. There are a bunch of public databases including federal government, county government and state government about lien holders, about drivers' licenses, violations, felonies and bankruptcies. All that stuff is out there publicly except that some companies are charging for you to go get it.
Get more advice from data management consultant Aaron Zornes
Find out what Zornes thinks about the leading MDM software vendors
Get Zornes' thoughts on master data governance
Learn more about the intersection of MDM and big data
Data governance is a major piece of the MDM puzzle. What is the connection between data governance and big data management?
Zornes: Again, you can't manage something unless you measure it. Therefore, data governance is critical because it allows you to proactively manage an asset. Big data just happens to be more textual and more wild and woolly then the structured information that we try to manage otherwise. It's already pretty chaotic in most large enterprises -- even when it comes to structured information. The good news is we have data governance processes, steering committees, trustees and data stewards. If you get all of this stuff set up in most large companies you can simply add another source of information -- that being your big data -- and figure out who owns it, who pays for it, what happens when it goes wild or bad, and what are the hierarchies for the different trusted sources. For example, you may be getting different big data feeds from multiple sources and sometimes they'll collide. Data governance processes will allow you to decide which one is the better source.
Mark Brunelli is the news director for SearchDataManagement.com. Follow him on Twitter: @Brunola88.