Petya Petrova - Fotolia
With big data architectures typically including a diverse mix of processing platforms and data stores, effectively managing and governing data across them all is becoming a must-do item. But big data governance processes are often still in the early stages of development, and the same goes for software tools that can help support budding governance efforts.
At the University of Texas MD Anderson Cancer Center, data governance is part of the agenda for the next phase of its big data initiative, along with heightened data security measures. "Those are things we didn't focus on initially," as the IT team worked to deploy a Hadoop cluster that began running applications in March, said Bryan Lari, director of institutional analytics at MD Anderson. But a solid governance strategy is "very important" to the ultimate success of the Houston-based healthcare organization's big data deployment, Lari added.
Big data governance is also a key element in managing a Hadoop-based architecture that 22,000 business users at General Electric Co.'s GE Power Services unit tap into via self-service business intelligence tools. "Once big data actually gets big, you've got to deal with it," said Don Perigo, chief enterprise architect at GE Power Services, which is headquartered in Baden, Switzerland.
The big data environment isn't as locked down as an earlier BI and analytics system was, however. "Every time you wanted to do something, you had to ask the warden for permission," said Perigo, who is based in Atlanta. By comparison, the big data platform is governed more on "a Wild West model," he added. "People are free to do what they want, but there is a sheriff" keeping an eye on them. If necessary, the IT team can modify queries so they run more efficiently or shut down user accounts altogether.
The big data governance challenges only get bigger as Hadoop clusters are tied to NoSQL databases, traditional data warehouses and other data repositories. "The danger is that you end up in kind of a chaotic state, where no one has any real idea what's going on across all these data stores," said Mike Ferguson, managing director of U.K.-based consultancy Intelligent Business Strategies.
Information catalogs and metadata management tools could help control the chaos, Ferguson said at the 2016 Pacific Northwest BI Summit in Grants Pass, Ore. But existing tools aren't fully up to the task, he added. "And it's not small holes. There are big gaps."
IT consultants, vendor execs predict BI, analytics and big data developments
Proper configuration and data partitioning make big data systems run smoothly
Consultant Anne Marie Smith discusses governing unstructured data