- Share this item with your network:
- Download

Business Information
- NewsHadoop data lake floated as primary info repository
- FeatureTalent management systems revolutionizing HR
- FeatureToday's talent management modules flaunt latest innovations
- FeatureA talent management strategy is needed for buying software
- FeatureDumping the old, companies flock to cloud HR systems
- FeatureSix things to do before deploying cloud apps
- FeatureKeep best practices in place for data-intensive cloud apps
- OpinionSuccess lies in strategic talent management, not software
- OpinionCloud vendors dish up vertical applications
- OpinionGazing into the future of predictive analytics ROI

Mike Kiev - Fotolia
Hadoop data lake floated as primary info repository
Hadoop vendors are pushing an approach that puts the distributed processing framework at the center of data management architectures. But some issues could sink the idea.
Hadoop is a powerful distributed processing technology, but it's hard to describe to the C-suite. So vendors came up with an easy-to-grasp metaphor: They want organizations to dive into the data lake, an architectural approach that positions Hadoop as a central repository for the diverse streams of data flowing into systems -- relegating the enterprise data warehouse (EDW) to the IT backwaters.
The buzz: Hadoop clusters based on commodity computers are a relatively inexpensive destination for data. And their waters can hold a variety of structured, unstructured and semi-structured information, including the hallmark of big data applications -- log files, Web clickstreams, sensor data, social media posts. Data stored in Hadoop also doesn't have to be cleansed and consolidated up front, as in an EDW; it can be harbored in raw form and schematized as needed for different analytics uses.
The reality: As a term, data lake invites sarcastic variations; data swamp, data marshland and data puddle are examples from the #datalake Twitter stream. More substantively, many organizations are just getting their feet wet with Hadoop and aren't ready to plunge in. Also, a reservoir of raw Hadoop data eventually needs to be refined to make it fit for consumption by business users. And Hadoop systems don't exist on an island: Traditional data warehouses likely will still play a big role in combination with them, leaving IT teams with new development and integration challenges to navigate.

Start the conversation
0 comments