It’s an old human problem: too much information and too little capacity to process and understand it. Computer...
technology has helped, of course, by providing automated systems and tools for organizing and analyzing ever-larger volumes of information. But now transaction systems and other data sources, such as websites, sensors and mobile devices, are producing a veritable flood of data that is overwhelming – or bypassing – the data warehouse strategies and frameworks built up in the past.
The new challenge for IT and data warehousing teams is how to leverage existing technology investments along with emerging tools and techniques to manage this tsunami of information – or “big data,” as it’s being referred to in data management circles.
At a minimum, says Forrester Research Inc. analyst Brian Hopkins, big data will require some rethinking by organizations with traditional data warehouses in place. For instance, Hopkins noted that integrating some forms of big data into a centralized data warehouse with hub-and-spoke connections to separate data marts will be a challenge. Data warehouses are primarily focused on structured data, he said. But much of what is categorized as big data is unstructured or semi-structured information.
“In a sense, big data turns data warehousing assumptions upside down,” Hopkins said. In traditional data warehousing and business intelligence environments focused on answering specific questions from business users, data is cleansed, put through an extract, transform and load process and ultimately turned into reports or made available for analysis. Typically, less than 5% of an organization’s available data is used, “sometimes significantly less,” according to Hopkins.
A new spin on data warehouse strategies
By contrast, big-data strategies often focus on a broader swath of information. In addition, “the notion of a big database in the sky with a completely consistent, 360-degree version of the truth wholly evaporates,” Hopkins said. Targeted data stores and so-called analytic sandboxes are common in big-data environments, which can add management complexity for IT and data warehousing teams and require heavy-duty processing power.
“With big data, there are bound to be more patterns and anomalies that are interesting,” said Wayne Eckerson, research director for TechTarget Inc.’s business applications and architecture media group. “But computationally, it’s a lot harder to analyze big data because there’s so much of it. It gets expensive.” That’s why many organizations are looking beyond traditional data warehousing systems and considering emerging big-data technologies, such as open source Hadoop and MapReduce, Eckerson added.
Richard Winter, president of Winter Corp., a Cambridge, Mass.-based consulting firm that focuses on data warehousing, said big data presents a variety of opportunities for using analytics to gain business insights that previously would have been difficult to uncover. He cited the example of a new “smart” inhaler developed for the treatment of asthma; when the inhaler is used, built-in wireless technology sends data such as the identity of the patient and the time and location to a database.
“It may be possible to combine that information with information on, say, when a load of potentially allergenic soy products was being unloaded nearby, putting soy dust in the air,” Winter said. Such correlations could help medical researchers to better understand the characteristics of asthma, he added.
Big-data challenges: Volume and more
But there are potential data management challenges: If the devices become widespread, Winter noted, tens of millions of people worldwide could begin using them and generating data that would need to be “stored, curated to some extent and made available for analysis over an extended period of time.”
IT and data warehousing professionals also need to understand that big data isn’t only about volume, said Philip Russom, research director for data management at The Data Warehousing Institute (TDWI). “The other attributes are just as important,” he said – for example, the fact that big data can be highly diverse, including Web clickstream data, call detail records, point-of-sale data, text from social network posts and various other types of information.
When confronted with the challenges of managing big-data installations, Russom said, many organizations are like the proverbial deer in the headlights: They recognize the potential value of the information but are daunted by the difficulties of getting a handle on it, especially when much of the data doesn’t lend itself to a traditional data warehousing process.
To avoid that kind of paralysis, Russom’s advice is to narrow the focus of a big-data management initiative to a high-payback area – customer behavior, for example – and then look at ways to apply both traditional and emerging tools to help gain control over the information. “Pick a topic that will lead you to something useful,” he said. “Don’t assume that you need to pursue every topic.”
ABOUT THE AUTHOR
Alan R. Earls is a Boston-area freelance writer focused on business and technology.