- Fotolia

Hadoop jumps in Spark stream -- goes beyond batch processing

Batch processing came, went and returned. Now it may be leaving again, MapR's Jack Norris tells the Talking Data podcasters.

If you blinked you may have missed it. Batch processing, a staple of the mainframe era, came back to the forefront as Hadoop MapReduce gained attention in the early part of this decade.

But Apache Spark and other streaming architectures are changing that, according to a lead executive at one of a small handful of independent Hadoop distribution vendors that have helped to decisively change data analytics in recent years. 

According to Jack Norris, formerly chief marketing officer and now senior vice president of data and applications at MapR, we will see more convergence in real time and batch architecture as Apache Spark joins Hadoop, and event streaming is matched with big data storage. Norris spoke about this and other pressing data topics in the latest edition of the Talking Data podcast.

Along with Hortonworks and Cloudera, MapR has helped forge Hadoop as a business, and Norris has been at the center of that activity. Of late, the rise of Spark Streaming and related technologies have brought about a shift in big data applications, which seems to be spurring a new round of changes in the Hadoop ecosystem.

Norris said Spark has become a particularly useful complement to original Hadoop components. He estimated that about half the users of MapR's Hadoop distribution are working with Spark "at various stages of production."

Spark created a flurry of excitement, he said, in part because MapReduce programming, which was synonymous with Hadoop computing in the early going, was difficult.

Jack Norris, MapR Jack Norris

"Spark brings relative ease of development. It brought new APIs that allowed you to program in Scala and Python, and it made it much easier to develop applications," Norris said. "It provided the constructs for streaming analytics as well."

Spark, he said, has made it easier to look at events as they arrive and to perform automatic aggregation and filtering -- thus turning raw data into useful information. He said some applications have been forced to work in batch processing mode because of overall system limitations, but that is changing. Listen to the discussion of Spark and related topics on this edition of the Talking Data podcast.

Next Steps

Find out how NoSQL and JSON fit into the big data picture today.

Read about means and methods that mix SQL with Hadoop.

Check out this proposal to bring better data management to Hadoop applications.

Dig Deeper on Hadoop framework