This content is part of the Essential Guide: Using big data platforms for data management, access and analytics

Containers and microservices find home in Hadoop ecosystem

Big data is moving from its bare-metal roots, and data streaming is a driver. Containers and microservices may have a role to play in what's next. An e-commerce application shows the way.

Much of the recent big data experience has been a bare-metal affair, meaning Hadoop has happened largely on nonvirtualized...

servers. That could change as containers and microservices gain traction in application development circles.

Both Containers and microservices break up monolithic application code into more finely grained pieces.  That streamlines development and makes for easier testing, which is one of the keys to more flexible application deployment and code reuse.

It is early on for such techniques to be applied to big data, but, for new jobs like data streaming, microservices shows promise. For a technology manager at a leading European e-commerce company, the microservices approach simplifies development and enables code reuse.

With microservices, "you can very much economize on what you're doing," according to Rupert Steffner, chief platform architect for business intelligence systems at Otto GmbH, a multichannel retailer based in Hamburg, Germany. He goes further: For some types of applications, not using microservices "is stupid. You're building the same functionality over and over again."

The types of applications Steffner is talking about are multiple artificial intelligence (AI) bots that run various real-time analytics jobs on the company's online retail site. Otto uses a combination of microservices, Docker containers and stream processing technologies to power these AI bots.

Containers and microservices, oh my

Cloud computing has been one of the drivers edging Hadoop, Spark and other big data technologies toward virtualization, containers and microservices. There is still much infrastructure to build out, but companies are working on technologies to ease the evolution.

"Hadoop was largely run on bare metal, but it runs also on virtual machines; for example, on the Amazon cloud and Azure cloud and via OpenStack. Now it is moving to containers," said Tom Phelan, co-founder and chief architect at BlueData Software Inc., maker of a platform that automatically spawns Hadoop or Spark clusters.

"It used to be that performance of Hadoop clusters on bare metal was better, but that is changing," he said. Containers need to gain maturity, he acknowledged, adding that Hadoop, as it was originally designed, is not a microservices-style architecture. Santa Clara, Calif.-based BlueData recently updated its software to improve container support, rolling out automated Kerberos setups for Hadoop clusters and Linux privileged access management tools.

Agility and streaming are other drivers of microservices interest, according to a manager at Hadoop distribution vendor MapR Technologies Inc. Jack Norris, senior vice president of data and applications at MapR, said customers building bots and the like need to adapt quickly to data and machine learning models.

We see a need to open up to a broader set of applications.
Jack Norrissenior vice president of data and applications, MapR Technologies

That is especially true in applications that include what he described as "event-driven" architectures. Such architectures increasingly include data streaming components.

Norris said that, as Hadoop and Spark application flows become more complex, they become harder to update. But, he continued, microservices narrowly focused on events in the data pipeline can bring more flexibility to such developments. This is a change from the original Hadoop development style.

"We see a need to open up to a broader set of applications," Norris said. At the same time, he pledged that MapR will continue to support the existing style of monolithic applications as well.

Last month, MapR sought to further the microservices cause in big data with microservices-specific volumes for application versioning, and dedicated microservices for A/B testing of machine learning models.  Also, a new reference architecture is available to guide developers through microservices for converged streaming data and real-time analytics applications, according to Norris.

AI bots watch lonely shopping carts

As big data processing jobs become complex combinations of components enabling precise data flows, the microservices approach is finding wider use. For Otto's Steffner, microservices provide a classic "divide-and-conquer" means to meet architectural needs.

Each of the AI bots in the Otto data architecture handles a particular task, said Steffner, who spoke at the Strata +Hadoop World 2016 conference in New York last month. For example, one AI bot looks for fraudulent transactions, another does analytical modeling to drive real-time ad placements and a third checks for empty online shopping carts to trigger last-gasp promotional offers before customers leave the site without buying anything.

The company accomplished this via Docker-based microservices architecture in October 2015 after a more conventional big data platform launched two years earlier didn't fully meet its needs, according to Steffner.

The Docker containers are also a good fit for the bot concept, Steffner said. At the back end, Otto has installed a mix of open source stream processing engines, including Storm, Spark Streaming, Flink and Ignite. But Steffner said Ignite, an in-memory data fabric technology originally developed by GridGain Systems Inc., is handling the bulk of the real-time processing work in the current environment.

Includes reporting by executive editor Craig Stedman.

Next Steps

A containers and microservices primer

Find out about management frameworks for microservices

Learn about the implications of containers for IT

Dig Deeper on Big data management