Using big data platforms for data management, access and analytics
A comprehensive collection of articles, videos and more, hand-picked by our editors
The changes in the data landscape over the past few years have ramifications that aren't immediately apparent. Some basic tenets of the data profession are coming under review as big data systems proliferate. If nothing else, these shifts require flexibility on the part of data practitioners, according to Lakshmi Randall, principal at the Unabashed Advice consultancy. In 19-plus years, she has focused to a great extent on data preparation and quality issues. We caught up with her following her appearance on a panel that pitted data warehouses against data lakes at the recent Enterprise Data World 2016 event in San Diego.
I suppose pitting data warehouses against data lakes has some purpose. But isn't it just a fact that the data landscape is shifting? With that in mind, how do you see the relationship between the warehouse and the lake today?
Lakshmi Randall: What is breaking down is a strictly linear approach to data management and analytics. That is, one in which data travels a step-by-step path from acquisition to insights. It works when you understand the data, when it's predominantly structured and it originates from familiar data sources.
But in the case of big data -- notes from a physician or insurance claims form data -- the data is semi-structured or unstructured, making the linear approach no longer feasible. These examples require discovering the data sources, filing the data and facilitating the understanding of the data before we decide on the path to the insights.
You could move it to the data warehouse or, after the discovery process, you find it's not useful and you throw it away. I think with the change in the data landscape, you have to think about more than just the linear approach. You have instead to think also about discovery and exploratory approaches. Based on that, you decide on the next best actions for either processing or storing the data.
As the data landscape is changing, we are seeing new types of data. We should be open to different architectures, where it is appropriate. Data governance is still a key, but you have to have some level of agility and flexibility too.
Lakshmi Randall principal, Unabashed Advice
There seems a growing need for IT to support a somewhat different user than they may have in the past -- something like a power user on steroids, one might say.
Randall: Well, different use cases drive the different tactics. Data becomes part of a more iterative process. The personas that must be supported change. It is not just a persona that typically does day-to-day analysis. It may be what you call a power user or a data discovery user or a data scientist. It may be someone who combines the skills of domain knowledge along with some level of technical knowledge, a hybrid persona. Really, there is a need for a continuum of personas in the enterprise.
Let's look at another aspect of the data landscape: NoSQL. What are some forces driving interest in using NoSQL?
Randall: When you're modeling data that holds true relationships -- ones that are more affinity driven -- data modeling is different than it is with a traditional relational database. That is a great example of the need for a NoSQL database.
For example, as part of a customer experience management solution, there are different touch points in the customer journey. These can be across many different channels. And finding those special connections, I think, is only possible if we have NoSQL, given that it stores the data in something close to its natural form. That is, as opposed to having to translate the data into rows and columns. People are finding that there are some use cases, like this one, that are really good candidates for NoSQL databases. It all has to do with the nature of the data. If it is relational data, then relational databases and data warehouses are better candidates.
In your experience as of late, where is the data profession on all this? For example, with governance and modeling, there can be a natural inclination to ask for more upfront control. Are you seeing changes in the way teams are organizing?
Randall: The business is justified in demanding the ability to conduct ad-hoc analysis or to have access to the appropriate and relevant data in order to accelerate the time-to-insights. At the same time, the business should be a sponsor of IT in establishing governance and stewardship initiatives.
Today, the data profession extends across IT and the business. And the reality is the enterprise needs a continuum of personas -- that means people with quantitative skills, qualitative skills, domain experts, process experts, data scientists, data stewards and so on -- to support the multitude of business objectives.
Find out how open source is changing the landscape of data
Explore unstructured data -- and its semantic nuance
Take a look at data modeling under fire in the face of a big data onslaught