News Stay informed about the latest enterprise technology news and product updates.

Hadoop data lake not a place for just lounging around

A new Gartner report says the storage repository isn’t the trouble-free panacea many observers hail it to be. New data governance practices -- and new skills -- are critical.

This article can also be found in the Premium Editorial Download: Business Information: Launching big data initiatives? Be choosy about the data:

A Hadoop data lake might sound like the perfect getaway from rigid relational databases. But the dream of lower...

IT costs and increased data flexibility can get a dose of cold-water reality when it comes to achieving the promises of deeper analytics leading to increased business and competitive advantages.

A recent Gartner report, The Data Lake Fallacy: All Water and Little Substance, highlights some inherent problems in this big data basin, including data governance challenges and the culture and personnel shifts required to make it work in many organizations. "The cost story gets Hadoop in the door, but the skill it takes to realize value from disparate data sources is rare," said Nick Heudecker, a Gartner analyst and co-author of the report.

Before you jump in, here are a few things to consider, gleaned from the Gartner report and various interviews:

  • Recognize that data lakes won't deliver increased business value without an appropriate investment in skills, tools and training.
  • Be aware of the risks of putting a wide variety of data types in one place. Make sure there is descriptive metadata and mechanisms to maintain it, or the data lake could become a swamp.
  • To make effective use of the data, build small teams of data scientists and embed them in business units.
  • Focus on ensuring semantic consistency in upstream applications and data stores.
  • Don't open the floodgates and try to fill a data lake all at once. Start small and then expand the deployment once you get your feet wet.

 

Next Steps

Learn how the data lake can disrupt big data management

Why the data lake isn't all R and R for IT teams

More on the issues facing proponents of the data lake

Don't forget about design principles when jumping in Hadoop data lake

This was last published in February 2015

Dig Deeper on Hadoop framework

PRO+

Content

Find more PRO+ content and other member only offers, here.

Start the conversation

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchAWS

SearchContentManagement

SearchOracle

SearchSAP

SearchSQLServer

Close