Hadoop data lake floated as primary info repository

Hadoop vendors are pushing an approach that puts the distributed processing framework at the center of data management architectures. But some issues could sink the idea.

This article can also be found in the Premium Editorial Download: Business Information: New talent management software transforming HR:

Hadoop is a powerful distributed processing technology, but it's hard to describe to the C-suite. So vendors came up with an easy-to-grasp metaphor: They want organizations to dive into the data lake, an architectural approach that positions Hadoop as a central repository for the diverse streams of data flowing into systems -- relegating the enterprise data warehouse (EDW) to the IT backwaters.

The buzz: Hadoop clusters based on commodity computers are a relatively inexpensive destination for data. And their waters can hold a variety of structured, unstructured and semi-structured information, including the hallmark of big data applications -- log files, Web clickstreams, sensor data, social media posts. Data stored in Hadoop also doesn't have to be cleansed and consolidated up front, as in an EDW; it can be harbored in raw form and schematized as needed for different analytics uses.

The reality: As a term, data lake invites sarcastic variations; data swamp, data marshland and data puddle are examples from the #datalake Twitter stream. More substantively, many organizations are just getting their feet wet with Hadoop and aren't ready to plunge in. Also, a reservoir of raw Hadoop data eventually needs to be refined to make it fit for consumption by business users. And Hadoop systems don't exist on an island: Traditional data warehouses likely will still play a big role in combination with them, leaving IT teams with new development and integration challenges to navigate.

Hadoop data lake
Hadoop vendors paint the picture of an expansive lake teeming with data from diverse sources. Business intelligence and analytics systems can drink directly from these information-rich waters or tap into filtered supplies stored in data warehouses and other databases.

 

Dig deeper on Database management system (DBMS) software and technology

Pro+

Features

Enjoy the benefits of Pro+ membership, learn more and join.

0 comments

Oldest 

Forgot Password?

No problem! Submit your e-mail address below. We'll send you an email containing your password.

Your password has been sent to:

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchAWS

SearchContentManagement

SearchOracle

SearchSAP

SearchSOA

SearchSQLServer

Close