michelangelus - Fotolia
Changes in data affect the way data professionals work, and the onslaught of unstructured data is a prime example. It was the topic when we recently spoke with expert Anne Marie Smith, principal consultant at Alabama Yankee Systems LLC. This is part one in a series.
Is it fair to say big data is about more than just volume, and that lack of structure is often what is being discussed?
Anne Marie Smith: The definition of big data changes depending upon what organization you talk to, what type of organization that is, and where you are working in the organization. The concept of big data has evolved over the last, let's say, seven or eight years. And now, I think it can be stated that big data is a collection of more than just unstructured data -- data that is not found in traditional records file formats.
If you think of a Word document that's stored electronically, that's unstructured data. If you think of email, that's unstructured data. The recording that you're making of this conversation is unstructured data. All of that has existed in the past but now there has been an explosion of the storage capabilities.
So not only has the definition of big data evolved to include more stuff, the volume of it has increased. Now there's a conundrum -- what is value and what value could this material contain? We don't know yet, which is why companies are trying to mine this data but they don't know what they're looking for.
Smith: Many companies have not even bothered to govern their structured data up to this point for a variety of reasons. They didn't see this data was an asset in itself so there was no point in governing it like they govern assets such as money, physical plants or products they make. But recently, a fair number of companies had discovered that it is a value to structured data, so they started to institute some forms of data governance.
Anne Marie Smithconsultant, Alabama Yankee Systems LLC
Whether those forms conformed to industry standard best practices is left for another discussion. Let's just say there are a decent number of companies that are engaging in some form of structured data governance.
I would venture to say that if you took the totality of companies that are engaging in some form of structured data cost governance, not even 1%, maybe one-half of 1% of them, are engaging in any form of unstructured data governance, for a variety of reasons.
They're struggling as it is with structured data governance, so they're staring at the mountain of structured data governance and they don't know what to do with that. So, why would they ever tackle this unknown thing of unstructured data that they barely know how to define?
Even if they are in some way successful with structured data governance, the task of applying practices of structured data governance to unstructured data governance could be highly daunting -- because they don't know where to start with the unstructured material.
A reason that they don't know where to start is that they haven't defined their unstructured data sufficiently to tackle unstructured data governance. Right now, they do it the way you eat an elephant -- one bite at a time. They think they have to tackle unstructured data governance in its totality when really that's typically the worst thing you should do.
Read more expert opinion from Anne Marie Smith
Listen to a podcast report on master data management as it faces big data
Learn more about best practices for data management
Read up on the InfoSphere Information Governance Catalog from IBM