Hadoop is a powerful distributed processing technology, but it's hard to describe to the C-suite. So vendors came...
By submitting your email address, you agree to receive emails regarding relevant topic offers from TechTarget and its partners. You can withdraw your consent at any time. Contact TechTarget at 275 Grove Street, Newton, MA.
up with an easy-to-grasp metaphor: They want organizations to dive into the data lake, an architectural approach that positions Hadoop as a central repository for the diverse streams of data flowing into systems -- relegating the enterprise data warehouse (EDW) to the IT backwaters.
The buzz: Hadoop clusters based on commodity computers are a relatively inexpensive destination for data. And their waters can hold a variety of structured, unstructured and semi-structured information, including the hallmark of big data applications -- log files, Web clickstreams, sensor data, social media posts. Data stored in Hadoop also doesn't have to be cleansed and consolidated up front, as in an EDW; it can be harbored in raw form and schematized as needed for different analytics uses.
The reality: As a term, data lake invites sarcastic variations; data swamp, data marshland and data puddle are examples from the #datalake Twitter stream. More substantively, many organizations are just getting their feet wet with Hadoop and aren't ready to plunge in. Also, a reservoir of raw Hadoop data eventually needs to be refined to make it fit for consumption by business users. And Hadoop systems don't exist on an island: Traditional data warehouses likely will still play a big role in combination with them, leaving IT teams with new development and integration challenges to navigate.