Gorodenkoff - stock.adobe.com
The Confluent Platform continues to expand on capabilities useful for Kafka-based data streaming, with additions that are part of a 5.0 release now available from Confluent Inc.
The creation of former LinkedIn data engineers who helped build the Kafka messaging framework, Confluent Platform's goal is to make real-time big data analytics accessible to a wider community.
Part of that effort takes the form of KSQL, which is meant to bring easier SQL-style queries to analytics on Kafka data. KSQL is a Kafka-savvy SQL query engine and language Confluent created in 2017 to open Kafka streaming data to analytics.
Version 5.0 of the Confluent Platform, commercially released on July 31, seeks to improve disaster recovery with more adept handling of application client failover to enhance IoT abilities with MQTT proxy support, and to reduce the need to use Java for programming streaming analytics with a new GUI for writing KSQL code.
Data dips into mainstream
Confluent Platform 5.0's support for disaster recovery and other improvements is useful, said Doug Henschen, a principal analyst at Constellation Research. But the bigger value in the release, he said, is in KSQL's potential for "the mainstreaming of streaming analytics."
Besides the new GUI, this Confluent release upgrades the KSQL engine with support for user-defined functions, which are essential parts of many existing SQL workloads. Also, the release supports handling nested data in popular Avro and JSON formats.
"With these moves Confluent is meeting developer expectations and delivering sought-after capabilities in the context of next-generation streaming applications," Henschen said.
That's important because web, cloud and IoT applications are creating data at a prodigious rate, and companies are looking to analyze that data as part of real-time operations. The programming skills required to do that level of development remain rare, but, as big data ecosystem software like Apache Spark and Kafka find wider use, simpler libraries and interfaces are appearing to link data streaming and analytics more easily.
Kafka, take a log
At its base, Kafka is a log-oriented publish-and-subscribe messaging system created to handle the data created by burgeoning web and cloud activity at social media giant LinkedIn.
The core software has been open sourced as Apache Kafka. Key Kafka messaging framework originators, including Jay Kreps, Neha Narkhede and others, left LinkedIn in 2014 to found Confluent, with the stated intent to build on core Kafka messaging for further enterprise purposes.
Joanna Schloss, Confluent's director of product marketing, said Confluent Platform's support for nested data in Avro and JSON will enable greater use of business intelligence (BI) tools in Kafka data streaming. In addition, KSQL now support more complex joins, allowing KSQL applications to enhance data in more varied ways.
She said opening KSQL activity to view via a GUI makes KSQL a full citizen in modern development teams in which programmers, as well as DevOps and operations staff, all take part in data streaming efforts.
"Among developers, DevOps and operations personnel there are persons interested in seeing how Kafka clusters are performing," she said. Now, with the KSQL GUI, "when something arrives they can use SQL [skills] to watch what happened." They don't need to find a Java developer to interrogate the system, she noted.
Making Kafka more accessible for applications
KSQL is among the streaming analytics capabilities of interest to Stephane Maarek, CEO at DataCumulus, a Paris-based firm focused on Java, Scala and Kafka training and consulting.
Maarek said KSQL has potential to encapsulate a lot of programming complexity, and, in turn, to lower the barrier to writing streaming applications. In this, Maarek said, Confluent is helping make Kafka more accessible "to a variety of use cases and data sources."
Moreover, because the open source community that supports Kafka "is strong, the real-time applications are really easy to create and operate," Maarek added.
Advances in the replication capabilities in Confluent Platform are "a leap forward for disaster recovery, which has to date been something of a pain point," he said.
Maarek also said he welcomed recent updates to Confluent Control Center, because they give developers and administrators more insights into the activity of Kafka cluster components, particularly schema registry and application consumption lags -- the difference between messaging reads and messaging writes. The updates also reduce the need for administrators to write commands, according to Maarek.
Data streaming field
The data streaming field remains young, and Confluent faces competition from established data analytics players like IBM, Teradata and SAS Institute, Hadoop distribution vendors like Cloudera, Hortonworks and MapR, and a variety of specialists such as MemSQL, SQLstream and Striim.
"There's huge interest in streaming applications and near-real-time analytics, but it's a green space," Henschen said. "There are lots of ways to do it and lots of vendor camps -- database, messaging-streaming platforms, next-gen data platforms and so on -- all vying for a piece of the action."
However, Kafka often is a common ingredient, Henschen noted. Such ubiquity helps put Confluent in a position "to extend the open source core with broader capabilities in a commercial offering," he said.