This content is part of the Essential Guide: IoT analytics guide: Understanding Internet of Things data
Manage Learn to apply best practices and optimize your operations.

Big data integration requires firm handle on info at hand

Efforts to integrate big data into BI and analytics systems should start with a solid understanding of the available information and how clean it is.

In 1965, Bob Dylan famously sang, "You don't need a weatherman to know which the way wind blows." These days that line can be applied to the data integration process: You need only a passing acquaintance with IT trends to recognize that growing data volumes -- driven in part by the rise of big data -- are a force that threatens to topple efforts to pull together information from internal systems or external sources. 

Beyond spiraling amounts of transaction data, Merv Adrian, an analyst at Gartner Inc. in Stamford, Conn., sees two primary contributors to the rapid data growth that is heightening integration challenges for IT teams. One is the flood of data being generated on social networks and other websites or collected by Web server logs, sensors and measuring instruments. Then there are corporate documents and other records that companies in the past often stuck in file systems and left largely untapped; now many organizations are looking to incorporate them into business intelligence (BI) and analytics applications.

More on big data and data integration

Read more about the challenges that big data adds to BI data integration efforts

Learn how big data is driving the development of logical data warehousing architectures

Get consultant Rick Sherman's advice on the choice between data integration tools and manual coding

In addition to being voluminous and varied in nature, those information sources typically consist of either unstructured text or semi-structured machine data that can complicate big data integration efforts, Adrian said. For example, instrument readings often have a wide range of values and sequences that can appear in unpredictable ways. "You don't know what combination will occur," he said. "You can describe [the data] to a point, but you must scan and analyze it to really see its structure."

Putting an organization in position to use such data for BI purposes requires not just integrating the information but also making sure it is accurate, secure and auditable, said Rob Karel, a former Forrester Research Inc. analyst who now is vice president of product strategy at data management software vendor Informatica Corp. in Redwood City, Calif.

But the incentives for doing the required integration work are growing, too, according to Karel. "Organizations no longer compete just on products or services but on how they can change directions and on the insights they gain from data," he said, adding that the ability to gain competitive advantages from the effective use of different strands of information "has increased the need for data integration in general."

Large data volumes at work

Daniel Landsman, CEO and founder of mobile advertising startup Hypercon Global Inc. in San Diego, said businesses like his are looking to capture large quantities of data through the use of persistent tracking and sensor technologies and then analyze the information in order to better target ads to prospective customers.

That only makes sense if you can successfully process the captured data and integrate it with other information, though. "Companies aren't investing hundreds of millions of dollars into big data architectures because they like the idea," Landsman said. "The whole reason is to make more money."

Echoing Karel's comments, Landsman said the first major hurdle in big data integration projects is ensuring that the data being integrated is clean and consistent. Next, he added, is devising the right data architecture to handle both structured and unstructured data -- or modifying an existing enterprise architecture so it can accommodate unstructured types of information.

Companies aren't investing hundreds of millions of dollars into big data architectures because they like the idea. The whole idea is to make money.

Daniel Landsman, 
Hypercon Global Inc.

For Adrian, step No. 1 in laying the groundwork for a successful integration program is to do a detailed audit of the available information. "Most of the time, the value that an organization will realize early in an integration effort is from data that they already have in their [systems] but aren't using," he said. "So the first job is to find out what it is, where it is and who owns it and what you know about its quality. Then you can decide how to do the integration."

Uncovering hidden big data skills

Another step that Adrian recommends is identifying whether there are people with useful skills outside of the IT department. In many organizations, power users or programmers in business units have set up big data systems below IT's radar. "Bring them into the fold and train them [on integration]," he said. "These are the people who are already building MapReduce jobs to analyze data with Hadoop and who are implementing systems with NoSQL databases."

Once the prep work has been completed, Adrian added, "find a business problem that can be solved by exploiting big data, with someone who cares -- and focus on outcomes, not technology." Done right, that will provide both a trial run for the data integration team and credibility for the integration project, he said.

But into that happy picture Barry Murphy, an analyst at eDJ Group Inc. in Boston, adds the cost of storing and maintaining data. Marketing managers might want to churn through mountains of social media data in search of insights into customer sentiment about companies and products -- but will the results pay dividends that outweigh the data storage tab? Before blithely moving ahead on a big data integration initiative, Murphy said, organizations should consider how much data they really need and how much of what they have can be used to produce real business value.

What's needed along with a well-planned data integration strategy, he added, are policies and procedures for deleting data that isn't relevant or useful. "You can have the big data analysis capability while you also have in place processes to get rid of the data that is not valuable and not adding anything to the business," Murphy said.

About the author
Alan R. Earls is a Boston-based freelance writer focused on business and technology.

Email us at and follow us on Twitter: @sDataManagement.

Dig Deeper on Enterprise data integration (EDI) software

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.