PRO+ Premium Content/E-Handbooks

Thank you for joining!
Access your Pro+ Content below.
March 2016

Apache Spark architecture declares independence in some big data apps

Sponsored by SearchDataManagement

It comes as no great surprise that the Apache Spark architecture has been horning in on the batch processing domain once controlled by Hadoop's MapReduce. But that's only part of the story. With data processing, streaming and machine learning capabilities on its résumé, the open source engine is learning to get along entirely without Hadoop in certain applications. In fact, one industry analyst cautiously sees a day when Spark could declare total independence, potentially bust up Hadoop cluster dominance and link separately with other Apache technologies.

In this three-part handbook, senior news writer Jack Vaughan examines the distinct advantages the Apache Spark architecture has over MapReduce. Also highlighted is how Spark's ability to process and analyze streaming data is helping detect fraudulent activities at a major banking and credit-card company. Next, Spark 2.0's upcoming upgrades to analytics speed, machine learning libraries, SQL support and stream processing are detailed. To close, Vaughan and senior news writer Ed Burns look at combining Spark and NoSQL databases in operational analytics applications, which could help broaden the use of both technologies.

Table Of Contents

  • Spark engine speeds big data jobs, ousts MapReduce
  • Streaming update to address growing torrent of big data
  • Spark combines with NoSQL to rev up operational analytics