michelangelus - Fotolia

Data lake concept needs more big data use cases to flourish

Hadoop data lakes offer an enticing location for large data sets. But consultant Andy Hayler says more examples of successful big data projects are needed to help boost their adoption.

The high level of interest in Hadoop data lakes and other big data platforms is understandable. Everyone working in data management faces the challenge of how to tackle spiraling data volumes that increasingly stretch current database technologies to, and in some cases beyond, their limits.

Much of that growth comes from machine-generated data, such as the streams of information spawned by sensors in devices or logs of Internet traffic. Just storing such data is challenging enough, but then can you make sense of it in a useful way? Or is this explosion in data volumes more about generating revenue for data processing and storage vendors than it is about supporting big data use cases with tangible business benefits?

Analyzing large volumes of data can create genuine opportunities for insight. In 2009, a paper published in the scientific journal Nature showed that it was possible to track influenza outbreaks by analyzing Google search data. This seemingly bizarre idea worked by taking the previous five years of search data and correlating common search terms with data that the Centers for Disease Control and Prevention had gathered manually about influenza-related patient visits to clinics and hospitals.

It turned out that certain combinations of search terms like headache and sneezing were often typed in by people with flu symptoms. The correlation was enough to produce a geographic model that could predict, in close to real time, where flu outbreaks were occurring. This knowledge is useful because doctors can better plan for outbreaks and more quickly target preventive measures.

More to come on big data analytics

Of course, that is only one example of big data analytics in action -- and just because it had an application in this case doesn't mean it does in every nook and cranny of the business world. However, as technologies like Hadoop and Apache Spark make it economically practical to process, store and analyze very large volumes of raw data, it's likely that more and more big data analytics applications will crop up.

In fact, plenty of leading-edge organizations are already taking advantage of big data. In the U.K., supermarket giant Tesco analyzes data coming from its refrigeration units to try to predict which ones are likely to fail, allowing preventive maintenance and reducing operational downtime. In Texas, electric utility TXU uses data from smart meters to predict power consumption highs and lows. Its system "reads" the meters every 15 minutes rather than every few months, allowing electricity pricing to be adjusted to encourage businesses to reduce their usage in peak periods.

Consumer analytics offers further big data use cases. It's now common for companies to do sentiment analysis on social media posts to see how consumers perceive their brands. If you can link social media to internal customer and billing systems effectively (no trivial task), a number of new marketing and customer service possibilities open up. Telecom company T Mobile claimed to have halved customer churn in this way.

In another case, a large hotel chain linked its customer loyalty program to a file of Twitter data and used a proprietary algorithm to try to identify customers who had tweeted about its hotels. The company then targeted location-based offers at those customers. Another hotel chain, Marriott, uses big data analytics to help personalize offers made via its website to customers who are members of its loyalty program.

Big data applications call for context

But to make sense of large amounts of data, you need to able to put it into context. In many cases, though, big data file structures lack metadata that describes their content; in addition, many companies make no attempt to address data quality and struggle to tie together their various data sets, such as lists of customers and products, via a master data management program to boost accuracy and consistency. One organization that does so sensibly is the travel-booking company Amadeus. It maintains a vast Hadoop file of all its airline bookings for analysis but pre-processes entries by checking them against its master data system to ensure that the correct airline codes are being used.

In practice, most corporations today still focus their analytics efforts on structured transaction data held in traditional data warehouses. From my discussions with big data software vendors, I've found that many big data projects remain at the experimental stage. Many key trends in the IT industry start with a few pioneers and a lot of over-inflated expectations generated by the media and excited vendor marketing departments, so it should be no surprise that big data management and analytics would follow the same pattern.

Survey data supports the notion that it is. For example, a Capgemini study estimated that $31 billion was spent in 2013 on big data projects, yet only 35% of the projects were described as "successful" or better by the people spending the money. A December 2013 survey by my company, The Information Difference, found that just 13% of respondents had identified new business opportunities as part of their big data initiatives.

Yet hope springs eternal, and with all the money currently being poured into big data systems and applications, at least some of them are likely to yield useful results and further examples of viable big data use cases. If not, the data lakes being created in many enterprises will start to look more like data swamps.

About the author:
Andy Hayler is co-founder and CEO of London-based consulting company The Information Difference Ltd. and a frequent keynote speaker at conferences on master data management, data governance and data quality. He also reviews restaurants and blogs about food on the website Andy Hayler's Restaurant Guide.

Email us at [email protected] and follow us on Twitter: @sDataManagement.

Next Steps

Is data lake the best term? One expert says no

A look at data lake challenges

Why you can't just jump into a data lake deployment

Dig Deeper on Hadoop framework