Big data architectures that distribute data across a variety of systems can be tough for IT and data management teams to use. There's no lack of data integration and governance tools available, which might sound like a good thing. But software for managing an enterprise data lake environment in a unified way isn't so easy to find.
That makes keeping the data in different platforms on the same page a tricky task, according to Mike Ferguson, CEO of Intelligent Business Strategies, a U.K.-based consulting company that focuses on analytics and data management.
Ferguson takes a broad view of what constitutes a data lake, and he says the wide range of data stores and processing engines deployed by many organizations complicate efforts to keep track of what data is available where, to pull it all together cohesively and to govern diverse sets of big data.
Finding a way to balance things across an enterprise data lake is the key to effectively combining and managing Hadoop clusters, other big data systems and traditional data warehouses, Ferguson says in this video Q&A recorded in July 2016 at the Pacific Northwest BI Summit in Grants Pass, Ore. He adds, though, that the multitude of separate data management tools used with particular platforms and applications means that data and the insights about it generated by one department might not be accessible to others.
"The result is that we're getting silos all over the place. The cost of data integration is getting very, very high -- far higher than it should be -- and there's no understanding of what's being done in any of these tools by anyone using any of the other tools," Ferguson says. "So, the complexity is out of hand, and there's no real way to be clear as to what's happened to data."
In addition, Ferguson says that while there are initiatives underway to create data governance and metadata management frameworks for big data environments, the available technologies still have gaps. "There are a lot of bridges out here," he says, noting that users he works with aren't able to find software that can meet all of their data lake management needs.
The net result is that businesses often reinvent the wheel on data management and analytics processes in distributed data lakes without realizing it. To start addressing that problem, Ferguson says organizations should take steps such as creating information catalogs to "at least get to grips with what's in existence" in various data stores and to make sure everyone is up to date on data assets.
Doing so can help foster more collaboration internally, he says, pointing to the potential to integrate the work of business and IT employees throughout an organization in a sort of data production line, instead of having everyone do each task themselves. "Rather than just [assuming] software is going to solve everything, organizing yourselves to succeed would make a significant difference."
Watch the video to see more of what Ferguson has to say about managing and governing an enterprise data lake architecture, the state of tools designed to aid in that and creating collaborative data management approaches.