Guide to big data analytics tools, trends and best practices
A comprehensive collection of articles, videos and more, hand-picked by our editors
Phenix Energy Group is involved in a big undertaking: The Palm Harbor, Fla.-based company is building a network of pipelines, tank farms and deep water ports to transport crude oil products across Central America, from the Atlantic coast to the Pacific side. To support the initiative, Phenix is also embarking on a big data project that entails collecting operational data from 30,000 sensors along the pipelines. The big data software installation likely will include Hadoop, said Bruce Perrin, Phenix's chief operating officer and acting chief information officer.
But before making any technology decisions, the organization took a deep look at what it wanted to get from the big data program. The up-front assessment "helped clarify, at least to a certain degree, what tools are necessary to get it accomplished," Perrin said. "You can't know what tool to use until you know what you're going to use it on."
That's the kind of clear-eyed thinking companies need to employ when they're evaluating and selecting big data technologies, according to experienced users and IT analysts. It's easy to get caught up in all the hype about Hadoop -- but it isn't the right tool for all big data applications. Nor, for that matter, are NoSQL databases. In fact, more organizations are using other technologies to help power their big data environments, according to recent surveys.
Bruce PerrinCOO and acting CIO, Phenix Energy Group
For example, a survey conducted in the summer of 2013 by consultancies Enterprise Management Associates Inc. (EMA) and 9sight Consulting found that Hadoop was at the bottom of a list of eight technology platforms being used by the 259 respondents to support big data initiatives. Only 16% of the respondents said they were using Hadoop and associated technologies. NoSQL databases weren't much higher on the list; they ranked sixth, at 22%. Analytical databases -- columnar software, for example -- and appliances based on them were first, cited by 42% of the respondents. They were followed by operational data stores, cloud-based data services, traditional data warehouses and data marts.
A 2013 survey of TechTarget readers yielded similar results. In that case, mainstream relational databases and data warehouses were the most common technologies being used or eyed to support big data environments. They were chosen by 55% of the 222 respondents with active or planned big data projects, followed by analytical databases at 52% and data warehouse appliances at 46%. Hadoop systems and NoSQL software came in at 41% and 21%, respectively.
More to the big data story than Hadoop
In planning big data deployments and evaluating technology options, it typically isn't "a simple matter of figuring out which Hadoop distribution you need," said EMA analyst Shawn Rogers. "The assumption that Hadoop equals big data and is always the solution to the problem is a mistake." Instead, organizations need to match the right tools to the data they're looking to store and analyze and to their processing requirements, he said.
There also needs to be an underlying business reason for deploying big data software and systems in the first place.
"The right question to ask is from the business side: 'What kind of business problem are we having?' To look for technology in search of a business problem is absolutely the wrong approach," said Forrester Research Inc. analyst Boris Evelson. Likewise, David Loshin, president of consultancy Knowledge Integrity Inc., warned against getting "Hadoop-crazy" and deciding to adopt that or other big data tools just "because it sounds appealing and everybody else is doing it."
And if the signs point toward Hadoop or NoSQL software as the right technology for the job at hand, the final decision can be much more complicated than choosing conventional business intelligence (BI) and data warehousing software.
Different strokes for different big data vendors
Evelson tells his BI clients not to agonize too much over product selection. "If you know what you're doing, you'll make any of those technologies work," he said -- but with Hadoop and NoSQL, "it's a completely different story." That's partly because the technologies themselves aren't fully mature and many of the vendors offering them are startups; in addition, there's a plethora of options to choose from.
More than a half-dozen vendors offer commercial distributions of Hadoop. While each distribution is based on Apache Hadoop, "they're not the same at all, and you have to look at the different underlying philosophies of the vendors," said Brian Hopkins, another Forrester analyst. The choices can be even harder to parse on the NoSQL side, where there are dozens of databases, many created with specific uses in mind, across several distinct product categories.
Cloud-based big data services also might merit consideration, especially if business executives don't want to wait for systems to be cobbled together with Hadoop and the technologies surrounding it -- what Hopkins described as "the Hadoop erector set."
Another possible avenue that's emerging is integrated environments combining Hadoop with analytical databases and data warehouse software. Major vendors, such as IBM, Microsoft, Oracle, SAP, Teradata and EMC spin-off Pivotal, are moving in that direction. In those offerings, Hadoop becomes part of a broader infrastructure that can provide business users with "transparent access to all of the data" they need for BI and analytics applications, said Wayne Eckerson, an industry analyst at TechTarget and president of consultancy Eckerson Group.
Making the big data buying team
It's also critical to make sure the right people are involved in big data software purchasing decisions. That typically will include IT managers and staffers -- but the process shouldn't be an IT-only affair, or even an IT-led one. IT was the most-cited source of big data project sponsorship and funding in the EMA/9sight survey, selected by 22% of the respondents. But a combined 42% said their initiatives were being championed and funded by the finance, marketing or sales department.
Compass Group Canada, a Mississauga, Ontario, company that owns and operates about 2,000 food service outlets, created a loss prevention team made up of people from various business departments to devise a strategy for stopping employees from helping themselves to money from the till. A big part of the work involved defining policies to prevent such "shrinkage." But the team was also charged with finding technology to pull together and analyze terabytes of data from point-of-sale systems, inventory databases, employee time and attendance records, and in-store video cameras in search of clues on which workers were stealing.
The team, which eventually chose a set of tools that included software from vendor Lavastorm Analytics, consisted of Chief Financial Officer Brett Mooney, the company's vice president of business transformation, and representatives from human resources and the legal department. But IT wasn't directly involved. "We looked at all of this as a business issue, not as an IT issue," Mooney said.
Whoever's involved, making sure your organization's eyes are open to the capabilities and shortcomings of the technologies you're considering are a must for getting a big data implementation off to a good start -- and to prevent it from bogging down. "You don't want to box yourself into a dead end," Eckerson said, adding that the wrong decision could leave a company "spending a lot of money and not getting a lot in return."
Ed Burns, site editor of SearchBusinessAnalytics, and Emma Preslar, who worked as a TechTarget editorial assistant through Northeastern University's co-op program, contributed to this story.
Read more about the importance of a big-picture view in evaluating big data technology
Find out why big data tool selection is often a matter of making multiple choices
Get real-world big data deployment advice in our Hadoop project management guide