Fotolia

News Stay informed about the latest enterprise technology news and product updates.

ScyllaDB set to improve NoSQL database performance

ScyllaDB has been helpful for big organizations such as Comcast, which was able to trim from nearly 1,000 database nodes to 78 ScyllaDB nodes and now has more room to grow.

Database performance is one of the reasons some large organizations choose the open source ScyllaDB database platform. The startup database vendor introduced new features to accelerate performance and optimize the open source database platform.

While ScyllaDB develops its own technology, one of its primary use cases is as a drop-in replacement for the open source Apache Cassandra NoSQL database, which is used in large scale data deployments.

Since its inception in 2015, ScyllaDB has been offering the promise of better performance at scale, while remaining compatible with Cassandra. The need for improved performance is particularly important as organizations scale out data but want to do it without adding more server or cloud resources. ScyllaDB is also riding the wave of increasing open source database use.

The vendor revealed the new features Nov. 5 at its Scylla Summit 2019 conference in San Francisco.

One of ScyllaDB's users is Comcast, which has used the NoSQL database to replace existing Cassandra deployments, with some dramatic efficiency gains.

Philip Zimich, senior director of development and engineering at Comcast, said his group went from having about 1,000 Cassandra servers to only 78 ScyllaDB servers, while improving overall availability and performance.

"We evaluated a lot of databases over the last few years," Zimich said. 'What we found is that ScyllaDB is the best fit for our real-time operations."

The faster I can reduce the process time, the more snappy the UI feels to the end user.
Philip ZimichSenior director of development and engineering, Comcast

Zimich's team at Comcast is responsible for the X2 DVR scheduler for recording media content, supporting 15 million accounts across the Comcast X1 network.

When a user wants to watch a recording, the listing for everything the user has recorded is saved in the cloud with all its associated metadata. Zimich's team is responsible for maintaining and then serving up data to the users as needed. In terms of scale, the platform processes 2.4 billion transactions per day, making every bit of incremental performance gain important to achieve for Comcast and its users.

"The faster I can reduce the process time, the more snappy the UI feels to the end user," he said.

ScyllaDB incremental compaction

Among the new features ScyllaDB unveiled at Scylla Summit 2019 is Incremental Compaction Strategy, which reduces storage requirements. The capability could be useful to Comcast in the future, but in the short term it's not something Zimich said he needs.

With the Cassandra deployment, the nodes were at full capacity as well, enough that the media company would have needed to add many more nodes in the coming year to support the growing user base, Zimich said.

"With this migration, we have headroom for years," he said.

Image for Scylla Cloud from ScyllaDB
Interface for Scylla Cloud

New ScyllaDB features

In addition to the compaction feature, ScyllaDB introduced new Lightweight Transactions (LWT) capabilities. LWT can help support privacy and security, but the use cases are much broader than just security and privacy, ScyllaDB CTO and co-founder Avi Kivity said.

"LWT essentially ensures that data gets recorded simultaneously on all nodes of a Scylla cluster," Kivity said. "This ensures that any application querying the database receives the latest copy of the data, no matter which node it hits."

Without LWT, or something equivalent, different nodes in a cluster may disagree about the value of a particular record, as updates eventually propagate through the system. Kivity noted that LWT provide a stronger guarantee than a system without LWT that applications querying the database will always receive the latest information.

Another new feature coming to ScyllaDB is Change Data Capture (CDC), a tool to make it easier for applications to write changes to the database, and for technologies such as Kafka for streaming data, to get access to those changes. CDC records changes within the database itself, in a standard Cassandra Query Language (CQL) readable table, which consumers such as Kafka can subscribe to using standard CQL queries.

"Without CDC, an application would have to write an update twice: once to Kafka and once to the database," Kivity said.

ScyllaDB provides multiple versions of its software, including open source, enterprise (on-premises), and cloud editions.

The Incremental Compaction feature only will be in Scylla Enterprise and Scylla Cloud, but LWT and CDC will be released first in Scylla Open Source, then later in Scylla Enterprise.

"Lightweight Transactions, Incremental Compaction and CDC are already committed to the main branch of our codebase, and once they're ready, we'll provide more detail on the specific release versions they'll appear in," Kivity said.

Dig Deeper on Database management systems (DBMS)

Join the conversation

4 comments

Send me notifications when other members comment.

Please create a username to comment.

How is your organization using scale-out NoSQL databases such as Cassandra or ScyllaDB?
Cancel
As a long-time Oracle developer, I decided to see if ScyllaDB can be used as a data store for our project https://mortgagecalculator.ca.

While the structure of the data had to be changed, we were able to achieve good results. There was one difficulty is that we needed to search properties by both latitude and longitude ranges. And there is no way to write a query, such as:

SELECT ...  FROM listings WHERE lat >= X1 AND lat =< X2 AND lon >= Y1 AND lon <= Y2

So the solution was to create a separate table for this specific scenario as follows:

CREATE TABLE listings_by_lat (province TEXT, lat DOUBLE, listing_id UUID, PRIMARY KEY (province, lat) WITH CLUSTERING ORDER BY (lat ASC);

AND

CREATE TABLE listings_by_lon (province TEXT, lon DOUBLE, listing_id UUID, PRIMARY KEY (province, lon) WITH CLUSTERING ORDER BY (lat ASC);

province is a PARTITION KEY and lat and lon are clustering keys, which enables queries like this:

SELECT listing_id FROM listings_by_lat WHERE province='ON' AND lat >= X1 AND lat <= X2;

AND

SELECT listing_id FROM listings_by_lont WHERE province='ON' AND lon >= Y1 AND lat <= Y2;

Both of these queries can be run simultaneously (on different async Tasks), and then we simply intersect both hashsets and get what we need.

Here's a working example:

Cancel
That's an awesome example, Alex! If you do have any questions on how to get the most out of Scylla, feel free to ask in our Slack channel (http://slack.scylladb.com/).

The lat/long issue is not a trivial one.

Also, if you would like to write your example up for a blog, let me know. (I'm the content manager at ScyllaDB.com.)
Cancel
Hi Peter,

How can I get in touch with you?

I would love to do guest posts about a couple of workarounds that I had to develop for a commercial project in ScyllaDB/Cassandra, specifically:

2. SQL-like LIKE functionality;
3. SQL-like JOIN operations on the tables;

I couldn't find any available information/suggestions about these subjects.

Please let me know.

Thanks,
Alex

info@mortgagecalculator.ca

Cancel

-ADS BY GOOGLE

SearchBusinessAnalytics

SearchAWS

SearchContentManagement

SearchOracle

SearchSAP

SearchSQLServer

Close