Each quarter, the editors at SearchDataManagement recognize a data management technology for innovation and market impact. The product selected this quarter is the DataStax Enterprise 4.8 database management system from DataStax.
Product: DataStax Enterprise 4.8
Release Date: Sept. 23, 2015
What it does
DataStax Enterprise (DSE) is a commercial version of the Cassandra NoSQL database that supports geographically distributed handling of large volumes of fast-arriving data. The open source Cassandra database, which became an Apache Incubator Project in 2009, combines aspects of column-family (or table-style) databases and key-value stores, uses a token-ring style architecture, can be configured in distributed clusters, and supports flexible schema development.
The DataStax operational database management system integrates with the Solr search engine, the Hadoop file system and MapReduce and in-memory Spark compute engines. DSE supports remote cluster management via an associated DataStax OpsCenter offering, and includes services that provide diagnostic information on cluster performance.
Rich Suttonvice president of engineering, Nexgate
DSE version 4.8 introduces enhancements that enable more immediate indexing of data for search. This version is production-certified for Spark 1.4. An upcoming release of DSE will support Apache Cassandra 3.0 and its associated storage engine improvements. DataStax -- which is based in Santa Clara, Calif. -- has also has disclosed work on DSE Graph, which the company said it plans to integrate into the DSE platform. DSE Graph includes elements from the open source Titan Distributed Graph Database. DataStax acquired Aurelius, the originator of Titan, earlier this year.
Why it matters
DSE has become one of the very public faces of the Cassandra database that was originally created at Facebook by individuals who were taking inspiration from pioneering NoSQL Amazon Dynamo and Google BigTable database architectures. The software was designed for wide geographic deployment over multiple data centers, which has become an increasingly important goal for users as cloud, mobile and Web applications tax traditional relational operational databases in terms of scalability and volume.
But performance alone may not lead to wide enterprise use. For Cassandra -- and other NoSQL databases -- to find a place in the enterprise, the software must be readily manageable. Such manageability has been an objective of DataStax OpsCenter cluster manager software, available in basic- and advanced-functionality versions by subscription.
This software provides a visual interface for point-and-click cluster provisioning, while also offering automatic failover and secure administration. Additionally, the operational database's Solr integration is useful for applications that can employ search technology, in lieu of relational query methods, for ultrafast data access. Enhancements to DSE 4.8 fine tune these search operations by speeding indexing.
What users say
The rollout of individual DSE releases may be less important than DataStax's overall attention to trouble-free updates, said Rich Sutton, vice president of engineering and co-founder of social media compliance and security software developer Nexgate in Sunnyvale, Calif., now a division of security-as-a-service provider Proofpoint Inc.
Sutton spoke to a question often asked nowadays: Why use commercial software instead of the open source version? For his part, he marks DataStax's subscription support as very helpful for users just beginning to work with fairly new open source software.
"There have been many radical changes to Cassandra since 2012. Paying for support with DataStax allows us to have continuity and a guided path," he said. "That's important, because we weren't going to hire either a dedicated Cassandra expert or a DBA."
Sutton turned to Cassandra because of his company's need to ensure security by "searching through billions of content posts, tweets and photos," and the database's ability to handle the load. He said Nexgate is presently using DSE 4.5, running on the Amazon Web Services cloud.
DataStax's integration of Cassandra with Solr was important in his decision, according to Sutton, since the Nexgate system would be searching across vast amounts of content. "We decided to use [DSE] instead of standing up our own integration," he said.
"I'm an engineer and I know when I see a new animal that is constantly changing," he said, referring to NoSQL software and its general lack of maturity, and DataStax's commitment to help users through the process. "It's new territory and we need quick access for tuning and performance. To get the high performance, you have to understand what NoSQL is good at."
The taming of Cassandra also comes from management tools. Sutton said DataStax OpsCenter's graphical management for Cassandra clusters "provides a ton of visibility into current performance trends," as well as a configuration advisor that is useful to engineering teams.
Looking forward, Sutton is interested in the work DataStax is doing with the Titan graph database. Such databases have found use in security and other applications, but, in Sutton's estimation, they have required a lot of custom work and often have encountered performance issues. Since one of DataStax's defining themes is fast operations, "integration with Titan is very exciting," he said.
*DSE version 4.8 supports user-defined types for handling more varied data for simpler overall development.
*Production-certification for use with Apache Spark 1.4 eases updates; Spark job server support allows better monitoring of Spark operations.
*Support is available for Docker container deployment of DSE.
DSE is offered on a subscription basis that includes technical support. Customers can purchase subscriptions on a per-node or per-core basis. DataStax doesn't disclose specific pricing on its commercial database platform.
Check out a document-oriented NoSQL database
Learn about a NoSQL database with tunable memory settings
Lift the hood on a "NewSQL" database