juanjo tugores - Fotolia


Graph data model cements tight relationships between data elements

Graph databases can help define and discover relationships between entities -- and offer increased flexibility and better usability than relational database management systems.

Graph databases are a form of NoSQL software getting increased attention because of their ability to structurally map out relationships between different data elements. They differ from conventional relational database management systems in a number of ways -- one of the most interesting being the elevation of the data relationship to first-class status.

In a relational database, relationships between entities -- such as a customer and the products that customer has purchased -- are represented using foreign keys that point to corresponding data in different tables. In addition, the bulk of the data attributes are associated with the entities themselves. By comparison, in a graph database, entities (referred to as nodes or vertices) are tied to one another via connections with labels describing their relationship. As part of a graph data model, both entities and connections (called links or edges) can be assigned properties and corresponding values.

The value of this type of database structure shouldn't be underestimated. Consider a database that's used to keep track of where customers live and when they lived at each location. With a graph data model, we can represent Customer and Location as two core entities and use a "LIVED_AT" label to maintain direct relationships that link people to specific locations.

For example, we can set up a relationship that links an entity for a person named John Smith to one for an address at 123 Main St. in Yorktown, Va. The graph model can then be used to delineate a variety of information about the relationships between related entities. John Smith might have lived at more than one location -- if so, it's simple to add more connections using the same relationship template. We can also add properties to a connection to document when he lived at a particular place. This text string establishes the period during which he was a resident at the Yorktown address: {StartDate: 04/01/2013, EndDate: 02/28/2015}.

A graph model is less rigid than a relational database

It's possible to capture similar data in a relational database by creating a separate relationship table that uses foreign keys to point to the tables containing the data for the customer and location entities. But there are two qualitative benefits to be gained by employing a graph data model: increased flexibility and better usability.

Flexibility first: As with other NoSQL technologies, you don't need to build a fixed data model for a graph database. New connections between entities in a database can be added through direct interactions with the graph data store to specify the desired relationships -- e.g., John Smith painted the house at 123 Main St. in Yorktown.

There are two qualitative benefits to be gained by employing a graph data model: increased flexibility and better usability.

As a result, additional properties can be input without having to completely refresh an entire database. With a relational database, on the other hand, you would need to design a rigid data model upfront that was structured to enable the desired relationships to be configured via foreign keys.

From the usability standpoint, because connections are first-class objects in a graph model, their properties can be easily included in database queries. In the customer database example, it's straightforward to look for everyone in the database who has lived at that Yorktown location and to look more narrowly for the people who lived there in, say, 2012.

Data modeling still a necessary step

However, the flexibility of the graph approach doesn't mean no data modeling is needed. The first step is to understand the types of business questions to be asked and how they can be characterized to match them against the graph structure. Then you need to distinguish between the properties and attributes that are inherently related to the database entities and the ones that are associated with the relationships between those entities. Graph data modeling is an iterative process, and each attempt at devising a model and running some queries may trigger alterations in the ways that entities are linked and relationships are defined.

For example, in querying airline reservation data, the types of questions that data analysts ask might involve things such as differentiating between business and vacation travel, optimizing aircraft allocation by flight route and revising ticket prices based on demand. A graph database could include Traveler and Location entities, with a link between them to show when individual travelers book flights to particular locations. That might provide useful information about vacations versus business trips -- but it might not work for the other two query topics.

A second iteration might add Route as a new entity that's connected to two locations as the starting point and destination. A connection with a "BOOKED_TRIP" label could then link a person to a particular route, with embedded properties including the date, flight number and ticket cost. Subsequent iterations could produce further adjustments in the graph data model until the analysts are satisfied that their queries can be addressed. Even then, additional modifications are possible if new business analytics needs emerge.

Graph databases provide an easy-to-comprehend semantic framework for capturing, manipulating and analyzing data -- one that's flexible enough to enable on-the-fly refinements to speed the development of analytics applications. In uses that are a good fit -- such as recommendation engines, fraud detection systems and social networking applications -- the graph approach offers a viable alternative to using mainstream relational databases.

Next Steps

Learn more about graph databases vs. relational databases

Graph technology use is on the rise

Reltio Cloud capitalizes on graph analytics wave

Dig Deeper on Data modeling tools and techniques