Like Apache Hadoop and Apache Spark, Apache Kafka has become a linchpin in many complex distributed analytical systems. Like those systems, Kafka is beginning to find as much use on the cloud as on-premises.
That drive to the cloud lies behind a Kafka cloud service launched this week by Confluent, a company started by some of Kafka's originators.
Known as Confluent Cloud, it is described as a managed streaming data service that reduces the need for users to closely oversee Kafka operations.
Kafka began life as a publish-and-subscribe messaging system for handling the torrential firehose of modern big data, but under the tutelage of Confluent and others, it has been enhanced with monitoring, retention and failover capabilities. Used along with Confluent's Kafka Streams, it is also now intended as a streaming data alternative to large cloud providers' proprietary data pipes.
It seems that more than a few systems laden with open source technologies are still connected with cloud providers' proprietary data pipes when it comes to the cloud. Among those are systems like Amazon Kinesis, Google Dataflow and Microsoft Azure Event Hubs.
Transferrable, open source alternatives to cloud vendors' streaming systems could be welcome in some quarters, according to an analytics manager at a company providing online e-commerce services.
"I am personally concerned with lock-in," said Kevin Feasel, who has used both Kafka and Kinesis, and who is manager of predictive analytics at ChannelAdvisor, which runs its e-commerce platform software on the Amazon cloud.
"A cloud provider can offer good services, "but moving to a different cloud is difficult," said Feasel, who said that while he has personally worked with open source Kafka, his firm has chosen to use Amazon Kinesis in production on the AWS cloud.
Kafka cloud targets lock-in anxiety
Uses who don't want to be tied into a cloud provider's streaming data framework would be targets for Confluent, according to Neha Narkhede, co-founder and CTO at Confluent, and part of the team that originally forged the Kafka messaging system inside social media giant LinkedIn.
Neha Narkhedeco-founder and CTO, Confluent
"As an industry we have moved away from proprietary systems. I think this is going to play out the same way in the public cloud as well," she said, speaking from the Kafka Summit in New York.
For Confluent, as for others, better support for deployment in the cloud is of growing interest. Narkhede said a recent survey conducted of over 350 Apache Kafka users by the company found that 52% already employ Kafka on the public cloud. But it is not always easy to use Kafka there.
Cloud providers may not bar customers from using Kafka on their clouds. But they don't necessarily make it as easy as their own managed services either.
Narkhede admitted that trouble shooting Kafka distributed systems and running them at scale can be hard, and said the company is working to ensure its Kafka cloud service will be quick to configure and maintain, as cluster usage grows.
Better cloud support for software like Kafka is part of a wide set of trends, according to Fintan Ryan, an analyst at RedMonk. Those trends see greater use of microservices and cloud-native development methods, as well as a desire for faster implementations.
"People have become frustrated with traditional Hadoop infrastructure in terms of the time it takes to build and the maintenance that is required to run it," Ryan said. The question they ask, he continued, is "Can't we scale it out quickly on the cloud?"
Ryan said Confluent's addition of streaming APIs has been an important step toward achieving broader usefulness. He said he is seeing sustained interest -- with "year-over-year growth doubling" -- in Kafka in developer user group and mailing list activity that RedMonk tracks.
As more and more data is produced on the cloud, tools like Kafka are likely to find an ever greater role there. That means versions of such big data tools that are quickly configurable on the cloud are likely to become more prevalent.
For now, the Confluent Kafka cloud service is available in an early access program. First availability is on the Amazon Web Services cloud, with Microsoft Azure and Google Cloud said to follow.
Learn about Kafka's shift to include data streaming
Find out more about data in motion
Discover how one shop used an AWS data lake to feed analytics