Business Information

Technology insights for the data-driven enterprise

Mike Kiev - Fotolia

Get started Bring yourself up to speed with our introductory content.

Hadoop data lake floated as primary info repository

Hadoop vendors are pushing an approach that puts the distributed processing framework at the center of data management architectures. But some issues could sink the idea.

Hadoop is a powerful distributed processing technology, but it's hard to describe to the C-suite. So vendors came up with an easy-to-grasp metaphor: They want organizations to dive into the data lake, an architectural approach that positions Hadoop as a central repository for the diverse streams of data flowing into systems -- relegating the enterprise data warehouse (EDW) to the IT backwaters.

The buzz: Hadoop clusters based on commodity computers are a relatively inexpensive destination for data. And their waters can hold a variety of structured, unstructured and semi-structured information, including the hallmark of big data applications -- log files, Web clickstreams, sensor data, social media posts. Data stored in Hadoop also doesn't have to be cleansed and consolidated up front, as in an EDW; it can be harbored in raw form and schematized as needed for different analytics uses.

The reality: As a term, data lake invites sarcastic variations; data swamp, data marshland and data puddle are examples from the #datalake Twitter stream. More substantively, many organizations are just getting their feet wet with Hadoop and aren't ready to plunge in. Also, a reservoir of raw Hadoop data eventually needs to be refined to make it fit for consumption by business users. And Hadoop systems don't exist on an island: Traditional data warehouses likely will still play a big role in combination with them, leaving IT teams with new development and integration challenges to navigate.

Hadoop data lake
Hadoop vendors paint the picture of an expansive lake teeming with data from diverse sources. Business intelligence and analytics systems can drink directly from these information-rich waters or tap into filtered supplies stored in data warehouses and other databases.


Article 1 of 10

Dig Deeper on Database management system (DBMS) software and technology

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.

Get More Business Information

Access to all of our back issues View All