Sergey Nivens - Fotolia
DataStax's formal introduction this week of a Cassandra NoSQL database engine coupled with a graph DB looks to further the software's use in applications that straddle transaction processing and real-time analytics.
The company's new entry takes the form of DataStax Enterprise (DSE) 5.0. It couples an updated Cassandra column-family store with a version of the open source Titan graph database that DataStax describes as a full rewrite.
The release also adds selective replication intended to better support hub-and-spoke topologies that are appearing in internet of things and retail applications. Also new are multi-instance server automations that benefit some of the data center administrators who favor larger individual machines over larger distributed clusters.
For Gene Stevens, co-founder and CTO at ProtectWise Inc., a network security services provider based in Denver, DSE's Cassandra-based software is a step toward a new generation of big data security analytics.
Stevens said ProtectWise is using DSE and its Spark connector in a security system that "ingests 'north of' 20 billion records per day, and can handle many millions of records per second at peak." The system is also used by data scientists who study signs of historic activity that precede security breaches.
The best way to look at big data going forward is to picture a future with an unending stream of data, he said.
"What we are doing is moving away from a batch-oriented analytical approach to a stream processing, time-series-oriented approach. That is made possible with a product like Cassandra," he said.
In early adopter program
To date, DataStax's version of Cassandra has marked off a somewhat unique space among a slew of NoSQL databases. High-volume, high-speed web applications have been a sweet spot for a system that has evolved to combine elements of a key-value store, document orientation and a column format for distributed data processing, especially in the cloud.
The company was also early among NoSQL players to build a connector to the Apache Spark analytical engine. That link up is becoming increasingly common among all types of NoSQL databases.
While it is another step away from the roots of the original open source Cassandra, connecting the distributed Titan graph software closely to a Cassandra store could further broaden DSE's usefulness. Areas where the Titan-Cassandra combo may find applications include access control, network analysis and risk analysis. Each is an area where real-time analytical applications closely align with Cassandra's operations-side processing.
Stevens said ProtectWise's use of DSE does not extend yet to the graph database. But an appraisal is underway. "We are looking at this with interest. The graph DB is highly applicable in network security, what with the need for anomaly detection of millions of points," he said. "We are in an early adopter program."
Graph DBs: Beginning to be understood
Graph DBs have found a place in fraud detection and recommendation engines, said Gartner analyst Nick Heudecker, because these are instances where companies now want to make decisions in something very close to real time. "Graphs are one way to do that," he said.
Oddly, perhaps, fast handling of relationship data can often be accomplished more easily in graph databases than relational databases, despite the latter's relational name.
"Relational databases are about referential integrity -- graph databases are about relationship," Heudecker said.
But, beyond a basic set of use cases, the road for graphs is still uphill, according to Heudecker.
"Graphs are becoming more commonplace, but many of the possible use cases are still just beginning to be understood," he said.
Handling graph models on the same platform as a key-value store, or adding JSON NoSQL document support to a relational platform, both have at least one common outcome. "Placing different models of data processing together within a single footprint simplifies management," he said.
The DataStax release is a culmination of an effort that gained momentum earlier last year when DataStax acquired Aurelius LLC. That company's leaders were leading lights in development of the open source Titan graph database.
"Titan was one of the first distributed graph databases, and it addressed the question of how you spit a graph up to run in a distributed mode. It also was file-agnostic. But that presented issues in terms of always coding to the least common denominator," said Martin Van Ryswyk, vice president of engineering at DataStax, which is based in Santa Clara, Calif.
The present DSE graph engine was inspired by Titan, but the new design saw significant changes, many of which are about improving how Titan works with Cassandra, according to Van Ryswyk.
"It's a complete rewrite. It's inspired by Titan, but it is tied to our closed source system," he said.
For open source initiatives of the graph data variety, Van Ryswyk pointed to Gremlin as an area DataStax would look to support. This is a language for graph database development. "It is valuable to have a common graph language across many systems, including competitors," Van Ryswyk said.
Van Ryswyk suggested DataStax's integrations of Cassandra with Titan and Cassandra with Solr, the open source search platform, will ease users' development and administration burdens. For DataStax customer Gene Stevens at ProtectWise, that is an important added value.
"We started with open source Cassandra. But having Cassandra and Solr as separate systems, and keeping them in sync, is real hard. DSE tightly binds them," he said, adding that the DataStax implementation takes care of many low-level integration and programming chores encountered with raw Cassandra.
DataStax's bet is that tight binding of a graph database to Cassandra will find favor with customers who, like Stevens, are facing unending streams of data to process and analyze.
DataStax is far from alone in the rush to implement graphs. Other graph-oriented activity of late includes Neo Technology's redesigned Neo4j 3.0, which aims at higher scalability; TIBCO Software Inc.'s opening up its TIBCO Graph Database entry in a community review edition; and IBM preparing to make generally available a graph database as a service on Bluemix. Notable, too, is the rise of graph databases in vendors' offerings aimed at data lakes and master data management.
Discover types of NoSQL databases and key criteria for making selections
Learn about key considerations for determining if a NoSQL DBMB's fit for your purpose
Find out how to determine which NoSQL DBMS best fits your needs