WavebreakmediaMicro - Fotolia

Persistence pays off for software containers in big data

Software containers encapsulate complexity and ease deployment, two traits that are helping to elicit growing interest in using them as part of big data systems.

Software containers and big data architectures grew up separately and are both fairly new to enterprise computing. But recent product initiatives indicate that the two are starting to come together.

Greater integration of microservices and big data architecture was a goal this week as startup Portworx Inc. unwrapped a new version of its PX-Enterprise software for persistent container-based storage. The software adds "bring your own key" encryption for container data at rest. It joins a small but growing chorus of container enhancements for dealing with data persistence.

The benefit of bringing your own key, as Portworx would have it, is to reduce possible lock-in for the growing amount of data residing in containers on public clouds. If the cloud vendor holds the key, that can limit your ability to migrate.

The PX-Enterprise software works with Apache Hadoop, Spark, Cassandra, Amazon EMR and other systems, according to Eric Han, vice president of product management at Portworx in Los Altos, Calif.

Containers are executable packages of code, runtimes and system libraries. This breaks down the traditional, monolithic server architecture into a more service-oriented approach.

But their ephemeral, or stateless, nature -- typical Docker containers spin up, and then disperse -- is a bit at odds with data storage, which is stateful, or persistent. Portworx's key goal has been to match databases and emerging data frameworks with stateful containers, and, in turn, ease the job of the DevOps-style developer, by limiting the dependencies they have to be mindful of as they program, according to Han.

Microservices on containers

Portworx's move is part of a wider industry effort that includes startups like Mesosphere, which last month released its DC/OS 1.9 microservices platform with new data services for Cassandra from DataStax, Couchbase Server, Redis and others. Software like this plays to big data's need to simplify and speed implementations, according to Ed Hsu, vice president of product marketing at San Francisco-based Mesosphere.

We containerized our entire ecosystem. That allows us to spin applications up and down easily.
Gerard PaulkeCTO and big data architect, Quantium

In big data today, Hsu said, "there are multiple technologies, and bringing them together has to become easier for folks." As a result, he said, people are turning to microservices on containers and data services that provide persistence when building modern applications.

Mesophere's release was preceded by word in January that Hadoop distribution provider MapR Technologies had released the MapR Persistent Application Client Container (PACC) as part of its new platform for Docker containers.

According to Jack Norris, senior vice president of data and applications at MapR, these prebuilt containers let system architects pull together system elements using a single method for both file system and database system integration.

"Moving away from siloed architectures is what they have to do, and this gives them a way," Norris said.

Containerization for your application

Gerard Paulke, CTO and big data architect at Australian analytics firm Quantium and a customer of MapR, began working with containers at the same time it was building a big data platform to better understand how customers behaved and how to predict future behavior.

Paulke said containers help to simplify application development by isolating resources. With 500 analysts to support, that is important. That container architecture supplants the need for separate development, test and production environments, he said.

MapR's Persistent Application Client container, or PACC, comes both as a Posix-file-format-compatible client and as a

"We started building our platform just as containers were coming along. So we containerized our entire ecosystem," he said. "That allows us to spin applications up and down easily."

Speaking at the time of the MapR release, Paulke said he was looking forward to testing out the PACC, because "at this point, persisting data, in other words, storing it after processing it -- in effect, saving state -- is hard in the containerized world."

You said a 'stateful'

Software containers in software development fit well with emerging programming styles, according to John Myers, an analyst at Enterprise Management Associates Inc. In some cases, they outshine widely employed virtual machine methods.

"Containers let you focus on application logic. That is a key versus a virtual machine. The container takes care of a lot of the complexity," he said. Reduced deployment complexity speeds development, and leaves more room for flexible updates, Myers added.

In mobile applications, he noted, there is a particular need to support more and more data and analytics. So, more support for containers that are stateful, rather than stateless, in turn becomes more important.   

Myers added that he also sees a role for containers in easing the job of the data scientist. This is beginning to play out as "analyst notebooks" for analysis and machine learning begin to employ containers. Eventually, big data containers may bring more self-service capabilities to those that would do big data analytics.

Next Steps

Learn about containers in big data analytics applications

Check out our guide to modern data management

Find out how software containers mirror larger changes in data pipelines

Dig Deeper on Big data management