PRO+ Premium Content/Business Information

Thank you for joining!
Access your Pro+ Content below.
June 2017, Vol. 5, No. 3

Hadoop data governance takes hold in companies as data gets 'bigger'

When LinkedIn Corp. was a smaller company, it didn't matter so much internally how data captured from its social networking website for analysis was formatted and structured. "You could really log anything and access it later," said Yael Garten, LinkedIn's director of data science. That let data scientists work quickly on analytics applications, she added, without having to worry about any data inconsistencies that might result. But things changed, as the company and the amount of data it generated grew rapidly. Now, people see the wisdom of better governing the data in LinkedIn's Hadoop environment so it's standardized throughout the analytics cycle, Garten explained. Otherwise, "it becomes a nightmare when you have hundreds of teams emitting data and hundreds of teams consuming data," she noted. That's particularly true, she said, if data is stored schema-free -- a lesson that LinkedIn learned early on. Tools of the data governance trade Yael Garten LinkedIn's Hadoop data governance process includes an internally developed ...

Features in this issue

News in this issue

Columns in this issue