Sergey Nivens - Fotolia

DataStax Cassandra moves ahead with Azure deal and graph store

The DataStax Cassandra engine, officially called DataStax Enterprise, is now Spark-certified. The move is one of several for the NoSQL database on a possible upswing, further evidenced by a new deal with Microsoft.

Activity in NoSQL continued, as DataStax Cassandra enhancements were on display at the Cassandra Summit last week on the company's home turf of Santa Clara, Calif. The company disclosed details of Cassandra-based DataStax Enterprise (DSE) 4.8, which is now certified for use with Spark 1.4 and includes support for a scale-out graph database that arises from the company's Aurelius Ltd. purchase earlier this year.

DataStax also said it had forged a deal with Microsoft to speed deployment of DSE on Azure cloud environments. The agreement was announced at the summit by Scott Guthrie, executive vice president of the cloud and enterprise group at Microsoft. He said DataStax's software has proved able to work in a highly scalable fashion on the Azure cloud.

The Microsoft deal gains further attention for DataStax Cassandra among a gale-like rush of NoSQL database introductions. The open source Apache Cassandra database -- which first arose at Facebook, but is supported commercially by DataStax -- is something of a cross between a wide-column store and a key-value store.

Even before the Microsoft deal, Cassandra could claim a somewhat unique spot in the NoSQL world. The Cassandra database is just one of three NoSQL databases to appear in the top 10 of DB-Engines' ranking of database management systems, coming in at eighth in the September 2015 listing. The other two NoSQL databases in DB-Engines' top 10 are the MongoDB document store (fourth) and the Redis key-value store (10th). DB-Engines arrives at its rankings based on website mentions, Google Trends performance, LinkedIn skills profiles and other measures.

Cassandra predicts the weather

Increased use of Apache Cassandra accompanied a software reboot that saw The Weather Company -- parent company of the Weather Channel, based in Atlanta -- rely more heavily on Amazon Web Services deployment, according to Robert Strickland, director of software engineering. Use of Cassandra has grown as Strickland and other Weather Channel engineers have uncovered more use cases, he said. A principal focus is an API connecting users of Google, Apple, Yahoo and other services.

Fast data handling is a key. Handling weather update requests for such leading services translates into "something shy of 1 trillion requests per month," Strickland said.

[Cassandra's Spark integration] is better than any other NoSQL system.
Robert Stricklanddirector of software engineering, The Weather Company

The Weather Company has worked with several NoSQL databases -- and before that, SQL databases -- in a number of Web applications, but Strickland was ready to declare Cassandra "our go-to data store."

Strickland suggested wide use of Cassandra stemmed in part from its dual role as a key-value store and a columnar store. Atop that is a useful abstraction layer that is amenable to veteran SQL developers.

"What is different about Cassandra is that it provides schema on top of the stores that abstracts the underlying storage mechanism for you," he said. He also favored Cassandra's Spark integration, which, he said, "is better than any other NoSQL system."

Data engine update in the works

Speed and scalability of data processing were a big part of the Microsoft deal, according to Tony Kavanagh, chief marketing officer at DataStax. "Azure is all about global, hyperscale data access," he said. "That and being enterprise friendly, which we think is our sweet spot."

At the Cassandra Summit, DataStax also pointed to progress in a significant data engine rewrite. Cassandra 3.0 was recently released by the Apache Cassandra Project, and it includes data storage and data consistency enhancements, as well as support for ''materialized views'' that aid developers in creating normalized tables for better queries. Cassandra 3.0 is presently offered as a technical preview, with general availability expected later this year. 

Robin Schumacher, vice president of products at DataStax, said the storage framework underlying the upcoming release will save disk space by changing the way metadata storage is handled in Cassandra tables. He said developer productivity will  improve, too, as materialized views allow automated, server-side denormalization for fast lookups of data.

Database industry analyst Curt Monash gave good grades to DataStax -- along with MongoDB -- for using resources to address product limitations common to new, immature database management systems. He said MongoDB's rewrite was helped by its WiredTiger acquisition.

In an email, he said Cassandra's core store reworking could be on par with MongoDB's efforts, although it is too early to say definitively. Growing ranks of Cassandra adherents will closely watch version 3.0's move to general availability.

Next Steps

Learn about the latest engine swap for the MongoDB NoSQL database

Check out the menu on the Azure big data platform in the cloud

Check out this DataStax Cassandra overview

Dig Deeper on Database management system (DBMS) software and technology