Sergey Nivens - Fotolia
The Centers for Disease Control uses the Kafka open source event streaming technology in its COVID-19 electronic laboratory reporting (CELR) program that reports on COVID-19 data across the U.S.
Supporting the CDC in its data efforts is Northrop Grumman Corporation, which helped to build and manage the CELR system.
Event streaming COVID-19
In a user session at the Confluent-sponsored Kafka Summit, held virtually Aug. 24-25, Rishi Tarar, enterprise architect at Northrop Grumman, explained how the aerospace and defense giant uses Kafka to stream data from healthcare and testing facilities across the U.S. into the CDC, to provide accurate insight into the state of the COVID-19 pandemic.
With rapidly changing circumstances and data, Kafka event streaming technology plays a critical role in keeping data moving, Tarar said.
The CDC system is able to orchestrate data pipelines and then merge all the data into a single schema in real time. The CDC system uses a multivendor technology stack that includes Confluent, Kafka and multiple AWS cloud services including EKS for Kubernetes and S3 storage. The platform also uses Elasticsearch and Kibana to help with data search and visualization.
"Our team worked really hard to be able to deliver factual knowledge of every test event that happens, anywhere in the United States within any jurisdiction," Tarar said during the Aug. 24 session.
Kafka was originally developed at LinkedIn. It enables data to be streamed in a distributed way, into different applications and databases that can then use the data.
Apache Kafka is more than just event streaming, it's about enabling a new data-driven application architecture, according to speakers at the virtual conference.
A key was how the technology is being used at large scale to solve complex data management challenges.
Kafka event streaming at Walmart
Among other organizations that use Kafka is Walmart, which employs Kafka for various applications, including fraud detection.
In a user session Aug. 24, Navinder Pal Singh Brar, senior software engineer at Walmart, outlined how Walmart is using Kafka and what open source contributions the company has made to make it work better for everyone.
Walmart runs its fraud detection system on every online transaction. The system relies on Kafka event streaming to get data needed to make decisions.
Rishi TararEnterprise architect, Northrop Grumman
Brar noted that Walmart had availability and latency targets it needed to hit and ended up making multiple contributions to the open source Kafka project. Improvements to Kafka are identified in the open source project as Kafka Improvement Proposals.
Among the improvements contributed by Walmart is KIP-535, which enables an application to conditionally choose to get data from a replica rather than an original source, based on latency.
Most of the time data replicas are almost caught up with the active source, but there is still the possibility that it could be behind, Brar said. The challenge for Walmart was to get information to make a fraud detection decision as fast as possible, but sometimes the replica might have less data access lag time than an active source.
"So you're basically trading consistency for availability," Brar said. "In our fraud detection application, availability is more important since customer experience will be adversely affected if we block a transaction for a long time."
Kafka event streaming and the modern application stack
In a keynote on Tuesday, Confluent CEO and co-founder Jay Kreps detailed his views on the emergence of Apache Kafka event streaming as a fundamental part of the modern computing stack.
Kreps noted that in recent years there has been a change in the way applications and services are put together. A common approach in the past was to have a large database that stored data, which in turn was then used by applications to get information. Modern applications no longer get data from a single source, but rather interact with multiple sources of data to deliver a service.
"Kafka event streams and stream processing is meant to model a world where data management isn't just about storage," Kreps said. "It's about storage and the flow of data, it's about things happening and reacting to them."