News Stay informed about the latest enterprise technology news and product updates.

Confluent's Kafka data-streaming framework gets 'SQL-ized'

SQL on Hadoop arrived -- so did SQL on Spark. Now, SQL on Kafka is emerging to provide a different way to look at Kafka data as it streams through the enterprise.

The day when armies of business analysts can query incoming data in real time may be drawing closer. Supporting...

such continuous interactive queries is a goal of KSQL, software put forward this week by the Kafka data-streaming software originators at Confluent Inc.

KSQL is a SQL engine that directly handles Apache Kafka data streams. As such, it can skip other big data components that bring broadly supported SQL capabilities to Hadoop and Spark, but may require intermediate data stores and batch-oriented processing.

The software is intended to bridge the gap between real-time monitoring and real-time analytics, according to Neha Narkhede, co-founder and CTO at Confluent, based in Palo Alto, Calif. At the Kafka Summit in San Francisco, she said KSQL can continuously join streaming data, such as web user clicks, with relevant table-based data, such as account information.

She also said KSQL is intended to broaden the use of Kafka beyond Java and Python, opening up Kafka programming to developers familiar with SQL; although, the form of SQL Confluent is using here is a dialect, one the company has developed to deal with the unique architecture of Kafka streaming. The software is appearing first as a developer preview, and it will be available under an Apache 2.0 license, according to the company.

Kafka data on the move

Created at LinkedIn, Kafka began life as a publish-and-subscribe messaging system that focused on handling log files as system events. It became an Apache Software Foundation project, and it was expanded to support a fuller data-streaming architecture.

The open source version of Kafka is commonly used in Hadoop and Spark data pipelines today. That puts it at the center of much of the industry activity aimed at putting big data into motion.

"Overall, we are seeing Kafka growing across large enterprises, in startups and in job posts. Companies are looking for people with Kafka skill sets," said Fintan Ryan, an analyst at Portland, Maine-based RedMonk. "Underlying that is a drive in use of streaming data in general."

Much of the current streaming data landscape is centered on Kafka. But a grab bag of alternatives exists.

Alternatives are found in long-standing software, such as RabbitMQ and Tibco StreamBase; in later entries, such as Amazon Kinesis and Apache Spark Streaming; and in newly emerging frameworks, such as Apache Flink, Microsoft Azure Event Grid and others. Just this month, startup Streamlio emerged from stealth mode, describing its efforts to promote enterprise streaming based on Heron -- a stream processor emerging from distributed systems work at another social media mainstay, Twitter.

The goal of the newly released KSQL is to bring Kafka streaming programming directly to SQL-capable developers. For example, it's meant to join click streams via continuous queries with table data.
The goal of the newly released KSQL is to bring Kafka streaming programming directly to SQL-capable developers. For example, it's meant to join click streams via continuous queries with table data.

Waiting for the SQL

For now, Confluent's KSQL is programmed via a command-line interface, Ryan noted. That means opportunity, he said, for other software vendors to build drag-and-drop interfaces that tap into Kafka via KSQL. In fact, at the Kafka Summit, analytics software provider Arcadia Data said it was working with Confluent to support a visual interface for interactive queries on Kafka topics, or Kafka message containers, via KSQL.

Confluent's KSQL scheme meets competition among a handful of players that have already been working to connect Kafka with SQL. Some of those players were on hand at the Kafka Summit with product updates.

At the conference, Striim Inc. unrolled its 3.7.4 platform release, adding more monitoring metrics and Kafka diagnostic utilities, as well as new connectors to Amazon Web Services Redshift and Simple Storage Service, Google Cloud, Azure SQL Server, Azure Storage and Azure HDInsight. Also at the summit, SQLstream launched Blaze 5.2, supporting Apache SystemML for declarative programming of machine learning applications.

Kafka SQL links and other streaming activity should not belie the fact that big data streaming architecture is still a young discipline. That is emphasized by recent word that Apache Kafka would formally reach version 1.0.0 in early October.

Software veterans recall there was a time when development managers would wait until release 2.0 before touching any software, and release 1.0 was a nonstarter. But it seems the speed of data streaming today is such that those types of caution are out the window, at least where organizations sense the potential for significant business advantage.

Next Steps

Learn about Kafka's shift to include data streaming

Learn about 'exactly-once' processing from Confluent CTO

Be there as Kafka maven Kreps discusses the way of the log

Dig Deeper on Hadoop framework

PRO+

Content

Find more PRO+ content and other member only offers, here.

Join the conversation

2 comments

Send me notifications when other members comment.

By submitting you agree to receive email from TechTarget and its partners. If you reside outside of the United States, you consent to having your personal data transferred to and processed in the United States. Privacy

Please create a username to comment.

What roles do you see SQL playing in streaming analytics in your organization?
Cancel
Thanks for mentioning Striim in the article Jack. SQL based streaming analytics is not just an aspiration, it is already reality. 

We have had customers in production for years now using in-memory SQL-based continuous queries to process and analyze real-time data for many different use cases. With customers across all industries, the use cases include streaming data integration, hybrid-cloud migrations, security analytics, fraud prevention, location based applications, telco network monitoring, predictive maintenance, and much more. 

Most of these do not require Kafka, but all require enterprise grade features like scalability, reliability, and security, as well as a complete platform, not just high speed messaging and a few APIs. 

Take a look at my LinkedIn Article, summarizing a blog series I recently put together explaining all the ways we make the most of Apache Kafka.

Or, watch out latest YouTube Video to find out in 2 1/2 minutes.
Cancel

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchAWS

SearchContentManagement

SearchOracle

SearchSAP

SearchSQLServer

Close