In this edition of the Talking Data podcast, big data streaming analytics comes into focus. The technology formed a strong undercurrent at Strata + Hadoop World 2016.
Big data streaming analytics continues to gain attention. Today, much of this is marked by use of the Apache Kafka messaging system and the Apache Spark data processing engine's Spark Streaming module. The combination doesn't represent real-time development in the strictest sense, but it appears to meet the performance needs of a host of important mobile and Web application types that put data into motion.
The data streaming phenomenon was apparent at Strata + Hadoop World 2016 in San Jose, Calif., where Kafka and Spark Streaming often appeared in tandem in presentations that showed the latest activity on the Hadoop front lines.
Joining TechTarget editors for this edition of the Talking Data podcast are Doug Cutting and Jay Kreps, key originators of Hadoop and Kafka, respectively. In interviews recorded at the Strata conference, both discuss the ascent of streaming in big data applications.
Cutting, who is now chief architect at Hadoop distribution vendor Cloudera Inc., says that, of necessity, the original Hadoop implementation focused on data at rest. But with version 2.0 of Hadoop, which was initially released by The Apache Software Foundation in late 2013, big data streaming analytics became an important style of programming, according to Cutting.
The new forms of streaming software don't work at the sub-millisecond rates achieved by some earlier products -- complex event processing tools, for example. The upstart frameworks, however, are able to address large data sets, and to do so at price-points far below predecessor systems.
For his part, Kreps sees users building streaming platforms that collect data from different parts of the organization for processing.
"Streaming is a whole center of data for companies that adopt it," says Kreps, who left a role as a principal staff engineer at LinkedIn in 2014 to co-found Confluent Inc., where he's CEO. The startup offers a Kafka-based data streaming platform.
Because both Kafka and Spark Streaming arise from open source projects, they're capable of attracting an ever-growing cadre of skilled developers. Other open source streaming frameworks, such as Apache Storm and, most recently, Apache Flink, are available as well, auguring broader adoption of open source data streaming technology.
Listen to MapR's Jack Norris discuss real-time and batch processing
Find out what made "Spark" the word of the year in 2015 for data analytics
Learn how Spark provides timely streaming analytics for data