MemSQL 5.5 adds native data ingestion for Kafka, eases pipelining

Among a handful of new SQL-oriented, in-memory databases is MemSQL. Recent product updates are meant to improve data pipeline creation and performance in high-speed ingestion applications.

Each quarter, the editors at SearchDataManagement recognize a data management technology for innovation and market impact. The product selected this quarter is the newly released MemSQL 5.5 from MemSQL Inc.

Product: MemSQL 5.5

Release Date: Sept. 26, 2016

What it does

The MemSQL distributed in-memory database management system includes a disk-based column store, and it can support both transactional and analytics workloads. MemSQL, from vendor MemSQL Inc., distributes data through sharding. SQL queries in MemSQL can run as machine code due to the use of a specialized, low-level virtual machine compiler. The software includes a Streamliner module that connects MemSQL and Apache Spark for streaming data. With version 5.5, the system supports pipelining for native data ingestion from external sources, including Apache Kafka.

Why it matters

Fast data ingestion for larger volumes of disparate data is one of the more pressing issues for data architects today -- and it has effects on how they approach streaming, storage and analytics. Many approaches to solving the problem turn to wholly new data schemes that eschew SQL, a mainstay of enterprise IT for both transactions and analytics. Among a handful of SQL-friendly products to address that issue is MemSQL. Included in MemSQL 5.5 is an updated means of data streaming, which focuses on native ingestion in a messaging style based on exactly-once semantics utilizing the increasingly popular Apache Kafka queueing technology.

What users say

Making sure fast-arriving data doesn't take too long to process is increasingly important for billing, analytics, near-real-time reporting and other applications at Akamai Technologies, according to Mike DePrizio, a senior architect with the content delivery network services provider in Cambridge, Mass.

DePrizio said he and his colleagues have tested MemSQL's ingestion performance in a proof-of-concept project. Now, they have MemSQL software in production that serves as a data-persistence layer to handle newly arriving data.

"We have started building a data services API platform where users can come and get information on core data objects. We found that the API was not going to be performant if it had to go to five or six places to draw data," he said. "So, we created a persistence layer with MemSQL."

"This is a 'quick persistence layer' that can support subsecond retrieval of data. In that, MemSQL's in-memory architecture helps," he said.

While its support of SQL makes some of MemSQL familiar, DePrizio said adjustments were needed. Specifically, Akamai developers needed to think about data retrieval in new ways in order to obtain the benefits of sharding techniques the MemSQL software employs.


  • MemSQL Pipelines supports new SQL syntax for efficiently creating pipelines for real-time data, providing data ingestion from Kafka streams with exactly-once semantics.
  • The software is tuned to handle highly concurrent queries, provide automatic queuing for workload surges and better support of data movement for distributed joins.
  • An X-ray view facility provides live tracking of queries while they're running.


MemSQL licenses its software based on the cluster RAM capacity. Customer installations range from gigabytes to terabytes of memory.

Next Steps

Learn about software that applies elastic scaling to SQL

Find out about NewSQL databases in finance

Catch up on updates to Spark Streaming

Dig Deeper on Database management system (DBMS) software and technology