PRO+ Premium Content/Business Information

Thank you for joining!
Access your Pro+ Content below.
August 2014, Volume 2, Number 4

Hadoop data lake floated as primary info repository

Hadoop is a powerful distributed processing technology, but it's hard to describe to the C-suite. So vendors came up with an easy-to-grasp metaphor: They want organizations to dive into the data lake, an architectural approach that positions Hadoop as a central repository for the diverse streams of data flowing into systems -- relegating the enterprise data warehouse (EDW) to the IT backwaters. The buzz: Hadoop clusters based on commodity computers are a relatively inexpensive destination for data. And their waters can hold a variety of structured, unstructured and semi-structured information, including the hallmark of big data applications -- log files, Web clickstreams, sensor data, social media posts. Data stored in Hadoop also doesn't have to be cleansed and consolidated up front, as in an EDW; it can be harbored in raw form and schematized as needed for different analytics uses. The reality: As a term, data lake invites sarcastic variations; data swamp, data marshland and data puddle are examples from the #datalake Twitter ...

Features in this issue

News in this issue

Columns in this issue