- Share this item with your network:
- Download
Business Information
- FeatureHR leader's passion for nonprofit spurs HCM system upgrade
- FeatureApache Spark trumps MapReduce in speed, flexibility
- FeatureBig data challenges include what info to use -- and what not to
- TipLook to business needs in deciding what big data sets to analyze
- TipTips on building big data, advanced analytics programs
- FeatureDip in Hadoop data lake can be bracing for big data users
- TipHadoop data lake not a place for just lounging around
- OpinionApproach big data projects with care, purpose
- OpinionTailor your big data strategy with role-specific analytics

Andrea Danti - Fotolia
Hadoop data lake not a place for just lounging around
A new Gartner report says the storage repository isn’t the trouble-free panacea many observers hail it to be. New data governance practices -- and new skills -- are critical.
- Stephanie Neil
A Hadoop data lake might sound like the perfect getaway from rigid relational databases. But the dream of lower IT costs and increased data flexibility can get a dose of cold-water reality when it comes to achieving the promises of deeper analytics leading to increased business and competitive advantages.
A recent Gartner report, The Data Lake Fallacy: All Water and Little Substance, highlights some inherent problems in this big data basin, including data governance challenges and the culture and personnel shifts required to make it work in many organizations. "The cost story gets Hadoop in the door, but the skill it takes to realize value from disparate data sources is rare," said Nick Heudecker, a Gartner analyst and co-author of the report.
Before you jump in, here are a few things to consider, gleaned from the Gartner report and various interviews:
- Recognize that data lakes won't deliver increased business value without an appropriate investment in skills, tools and training.
- Be aware of the risks of putting a wide variety of data types in one place. Make sure there is descriptive metadata and mechanisms to maintain it, or the data lake could become a swamp.
- To make effective use of the data, build small teams of data scientists and embed them in business units.
- Focus on ensuring semantic consistency in upstream applications and data stores.
- Don't open the floodgates and try to fill a data lake all at once. Start small and then expand the deployment once you get your feet wet.
Next Steps
Learn how the data lake can disrupt big data management
Why the data lake isn't all R and R for IT teams
More on the issues facing proponents of the data lake
Don't forget about design principles when jumping in Hadoop data lake
Dig Deeper on Hadoop framework
-
Cloud data warehouse makes inroads as users spurn admin tasks
-
Commerzbank creates Hadoop-based platform for business-critical insights
-
Hadoop data lake architecture tests IT on data integration
-
Three ways to turn old files into Hadoop data sets in a data lake