michelangelus - Fotolia
GDPR influence is touching a Hadoop big data world that was immune to many privacy considerations until now. This podcast features the rise of Hadoop data governance for data lakes.
Big data analytics in the early days of Hadoop bordered on the experimental. Common were improvised data projects -- ones that introduced the world of large-scale processing of diverse data types.
Newly minted data scientists pursued analytical innovations at web startups, but paid scant attention to Hadoop data governance services that were more familiar in traditional enterprises. Now, a strong motivator in the form of the General Data Protection Regulation (GDPR) may bring greater governance to big data.
The origins of this change were seen at the recent DataWorks Summit 2018. While most of the attention focused on conference sponsor and Hadoop pioneer Hortonworks' efforts to expand its footprint in the cloud, there was strong interest in managing data privacy, as well.
That is not too surprising given that the event occurred just a month after the European Union's GDPR mandate became an enforceable regulation. At the Summit, implementers discussed useful means to populate data lakes, curate data and improve Hadoop data governance services.
Such is the backdrop for this episode of the Talking Data podcast, recorded at the DataWorks Summit. According to podcast guest Doug Henschen, an analyst at Constellation Research, GDPR provided some impetus for this new view.
"GDPR is setting forth a need for solid governance," he said, noting that while lineage tracking and related capabilities were discussed, they haven't necessarily been practiced or enforced at companies pursuing big analytics to date.
Doug HenschenConstellation Research
Greater governance for big data takes the form of the DataPlane Service for Hortonworks, he said.
"GDPR is a good thing. It's going to force companies to clean up their data and to create a more transparent data management architecture," Henschen said.
Hadoop data governance services are becoming a bigger part of the scene -- not just for big data, but, going forward, for all data, Henschen said.
Listen to this edition of the Talking Data podcast to find out more about the course forward for Hadoop and big data in the enterprise.