Managing Hadoop projects: What you need to know to succeed
A comprehensive collection of articles, videos and more, hand-picked by our editors
While a lot of ground has to be covered to deploy the Hadoop Distributed File System and associated technologies to support enterprise Hadoop uses, a roadmap outlining the path to that destination is starting to emerge.
At the Hadoop Summit 2013 in San Jose, Calif., a panel of IT leaders from a variety of industries offered guidance for companies that want to move from experimenting with Hadoop to using it in actual applications. They said it's easy to get started with open source Hadoop clusters -- but taking the technology to the next level is more difficult.
We expect hundreds and thousands of users on the Hadoop cluster.
business intelligence architect, Salesforce.com
Implementers should start small and be prepared to bring in outside training help and think up front about how Hadoop-processed data will become part of operational and analytical processes, according to participants in the panel discussion titled "Real-world Insight into Hadoop in the Enterprise."
The general rush to try out Hadoop brings its own issues, said Ratnakar Lavu, senior vice president of digital innovation at retailer Kohl's Corp. in Menomonee Falls, Wis. "It can be daunting," he said. "You hear about all the things that Hadoop can solve. You get all this data, then you go off and try to solve everything that you can think of."
But Lavu's team learned early on that small projects were good starting points with Hadoop. "It's a whole new way of doing things," he said. "Start with something small that you can actually manage. It's about learning."
Lavu also told would-be enterprise Hadoop users to be careful not to solve "problems that are already solved." For example, existing reports that are being produced and distributed effectively don't need to be redone in Hadoop just for the sake of changing platforms.
Hadoop first gained attention based on the efforts of systems programmers at Internet companies such as Yahoo, Google, Facebook and Twitter. But incorporating the technology into mainstream business and analytics applications takes different skills. Even Web stalwarts such as Salesforce.com Inc. have learned lessons while moving Hadoop into a support role for business decision makers.
"When Hadoop comes to mind, too often it's only the data -- how big it is. But as you add more and more users, you have to think in terms of the compute [requirements] also. It's not just the storage," said Ramesh Koteshwar, a business intelligence architect at Salesforce.
Looking ahead, Koteshwar anticipates that a sizable part of the company's workforce will want to query data collected in Hadoop for analytical uses. "We expect hundreds and thousands of users on the Hadoop cluster," he said.
More on enterprise Hadoop
Find out about Hadoop use cases and features
Check out Wayne Eckerson on Hadoop irony
Listen to a podcast on Hadoop in storage
Developing robust security capabilities is another part of the process of bringing Hadoop to wider use, according to Koteshwar. Hadoop use at Salesforce.com is very much still at an exploratory stage, and end-user access and authentication are barriers that must be hurdled on the track to broader deployment. "When you really want to bring it into the enterprise, you want to make sure there are security policies and processes in place in front of the Hadoop [cluster]," Koteshwar said.
Lavu concurred that the way you fit Hadoop systems into the overall organization is important. "It's about building the right processes and the right kind of systems and the data feeds as well as the user training and adoption," he said. "Those are the pieces that enable us to be successful."
While there has been a lot to learn in Hadoop's early days, at least some of the frontier work has been done, said Neeraj Kumar, vice president of information management and analytics at Cardinal Health in Dublin, Ohio. That betokens a benefit in moving to Hadoop now that more pieces of the related data infrastructure have been put into place.
"The starters of today are going to have a leg up on us," Kumar said. "We had to build a lot of ad hoc processes and solutions just because the previous versions of Hadoop lacked those features."
Kumar agreed that Hadoop deployment teams should start small and should find an initial application that provides a "net-new capability" for their companies.
"You need to also understand the talent base of your own organization," he said, adding that in many cases Hadoop creates a need to bring in new skills. As a result, he advised IT managers to start thinking about Hadoop training issues early in the project planning process. Consultants can help, Kumar said, but they aren't the ultimate answer for enterprise Hadoop deployments: "You do need talent on-site, on the ground."