A Hadoop data lake might sound like the perfect getaway from rigid relational databases. But the dream of lower...
IT costs and increased data flexibility can get a dose of cold-water reality when it comes to achieving the promises of deeper analytics leading to increased business and competitive advantages.
A recent Gartner report, The Data Lake Fallacy: All Water and Little Substance, highlights some inherent problems in this big data basin, including data governance challenges and the culture and personnel shifts required to make it work in many organizations. "The cost story gets Hadoop in the door, but the skill it takes to realize value from disparate data sources is rare," said Nick Heudecker, a Gartner analyst and co-author of the report.
Before you jump in, here are a few things to consider, gleaned from the Gartner report and various interviews:
- Recognize that data lakes won't deliver increased business value without an appropriate investment in skills, tools and training.
- Be aware of the risks of putting a wide variety of data types in one place. Make sure there is descriptive metadata and mechanisms to maintain it, or the data lake could become a swamp.
- To make effective use of the data, build small teams of data scientists and embed them in business units.
- Focus on ensuring semantic consistency in upstream applications and data stores.
- Don't open the floodgates and try to fill a data lake all at once. Start small and then expand the deployment once you get your feet wet.
Learn how the data lake can disrupt big data management
Why the data lake isn't all R and R for IT teams
More on the issues facing proponents of the data lake
Don't forget about design principles when jumping in Hadoop data lake