Dreaming Andy - Fotolia

Connectedness is king, as Neo4j graph database ports to Spark

The Neo4j graph database emphasizes easy relationship mapping for diverse data points. Now, its related Cypher query language is hooking into Apache Spark.

Graph database provider Neo4j is taking steps to grow a data platform around its similarly named database, adding...

Cypher graph language support for the Apache Spark analytics engine.

That, along with additional analytics, visualization, data import and transformation capabilities for the Neo4j graph database, was discussed at the company's GraphConnect conference in New York, where enterprise users described graph database implementations.

Graph databases hold advantages in an emerging style of data connectedness -- that is, they have abilities to readily map and remap relationships between data points in a way that may outpace relational databases.

The graph approach gained momentum earlier this year, as Microsoft added graph database support to its Azure Cosmos multimodel database. Various graph database capabilities are also offered by Cambridge Semantics, DataStax, Franz Inc., IBM, MarkLogic, Oracle and others.

Scripps channeling data

Brant Boehmann, ScrippsBrant Boehmann

Graphs can be useful in managing digital assets, according to Brant Boehmann, senior software engineer at Scripps Networks Interactive. Boehmann, together with Scripps software development manager Chris Goodacre, described experiences with the Neo4j graph database at GraphConnect.

Scripps' digital assets include content from HGTV, Food Network, Travel Channel and other cable properties. Asset attributes can range from cooking-segment recipes to digital-rights royalties for series and beyond.

Boehmann and Goodacre have forged a new system for managing assets that, increasingly, are repackaged and repurposed across new media types, some of which need to work on demand.

"Our need is to keep track of a large number of relationships -- about 45 million-plus," Boehmann said in an interview. He said Neo4j performance has scaled efficiently over three years, as the number of data points in the database has expanded.

Moreover, the Neo4j graph database supports a high level of abstraction. Boehmann called the Neo4j graph database models easily describable and "whiteboard-friendly."

"You can think in terms of a series and its seasons. And you can, in effect, draw lines to make connections," he said. That is in comparison to relational approaches where data models are based on tables and are connected via relational join operations.

Up from LDAP

The Scripps graph-based asset manager grew from earlier work on a Lightweight Directory Access Protocol repository, according to Goodacre, which, he agreed, could place it in the category of master data management systems.

Recommendation engines, social media applications and customer relationship models are among other areas where graph databases have gained acceptance. While graph data engines broadly lag relational stalwarts in use, they seem on the rise.

Chris Goodacre, ScrippsChris Goodacre

As described in a recent Forrester Research report on graphs, 51% of global data and analytics technology decision-makers have implemented, are implementing, are upgrading or are expanding graph databases in their organizations.

A major benefit is the graph's focus on connected data that helps organizations ask more complex questions without having to do more complex programing, according to Noel Yuhanna, report author and Forrester analyst.

An important differentiator that eases the programming burden is the inherent graphical nature of these databases, in comparison to relational databases that organize the world according to table structures, Yuhanna wrote.

Graph databases expand in organizations.
Graph databases, although a relatively new technology, are finding favor among technology decision-makers, according to Forrester.

Connect the data

A continuous string of updates has actually made Neo4j more a platform, rather than just a database, according to Philip Rathle, vice president of products at Neo4j. Improvements have included a desktop developer console, graph analytics and data integration tools, he said.

Rathle said continued enhancements reflect the fact that users have different needs at different points in the data lifecycle. He cited new Neo4j support for Apache Spark in this regard.

He said the company has recently created a graph mapping layer that integrates the Neo4j graph database with the Spark Catalyst SQL optimizer. Users will be able to traverse large data volumes on Spark as graphs, he said. Key to the effort is connecting Cypher, a declarative property graph query language, with Spark.

According to Rathle, Neo4j is donating an early version of a Cypher for Apache Spark toolkit to the openCypher project. Rathle said Cypher for Apache Spark is now available in alpha stage under an Apache 2.0 license.

The openCypher project began life in 2015, when Neo4j sought to open up the language, somewhat at the behest of users like those at Scripps, where Cypher as a proprietary language caused unease.

"We thought the database was a best fit, but were concerned that the language was closed and proprietary," Goodacre said. He said he and others had conversations with Neo4j on the topic, and the company has "taken steps in the right direction" with openCypher.

Goodacre's colleague Boehmann said the graph query language was an important part in reducing overall query programming complexity for graph implementations, versus relational database alternatives.  

Next Steps

Learn how graph databases tighten up data models

Find out how graph data powers an accommodation platform

Discover uses of graph technology in data management

Dig Deeper on Database management system (DBMS) architecture, design and strategy