michelangelus - Fotolia

Following a notion for SQL, architect creates NoSQL data modeling notation

Ted Hills, an enterprise architect at LexisNexis and the author of a recent book on NoSQL data modeling, discusses the ins and outs of database schema design in an era of big data disruption.

Emerging NoSQL data styles bring rapid development for innovative applications. But NoSQL also creates challenges. NoSQL systems can go into production quickly, sometimes without any upfront schema creation at all. However, analyzing NoSQL system output can be difficult when basic data modeling practices are eschewed. This was on Ted Hills' mind when he set out to create a modeling notation that spanned SQL and NoSQL styles, enabling better NoSQL data modeling.

As an enterprise architect working on development and governance of enterprise data models at LexisNexis, Hills has seen efforts to incorporate SQL and NoSQL structured and unstructured data assets up close. In his new book, NoSQL and SQL Data Modeling, Hills describes Concept and Object Modeling Notation, or COMN (pronounced "common"), which shows how to bring NoSQL under the bigger modeling tent. We first heard him speak about COMN at Enterprise Data World 2016 and caught up with him more recently to discuss these issues.

How did you come to see the need for a new notation for data modeling?

Ted Hills: I have used mostly entity relationship [ER] modeling in my work, as well as the Unified Modeling Language [UML] for software design. I was trying to model a metadata model and found it is really tough to do in ER, in that a fundamental assumption of ER is that all data is eventually going to be placed in tables. But that is not what the target is necessarily going to be if the target is a NoSQL database.

I found there were things I just could not bend or stretch the ER notation to represent. At a physical level, the two things that ER can't represent are arrays and nested types. At the logical level, it is more subtle.

Then I tried [Unified Modeling Language] and that didn't work. I tried fact-based modeling and that didn't work out either.

Ted Hills, LexisNexisTed Hills, LexisNexis

You see, all three of those modeling notations had some problems -- mistakes made at the fundamental level. And I decided I would try to address those by developing my own notation, COMN. Now, understand, COMN is open to the other methods in that you can take ER, UML or fact-based models and express them in COMN. That is because it covers all the same concepts, but it covers them more universally.

Doesn't it seem, sometimes, that the rush to NoSQL obscures the need for NoSQL data modeling?

Hills: Well, we've seen the whole NoSQL world getting carried away with the fact that they did not have to tell the DBMS [database management system] what the schema of the data was before they started storing it -- which is wonderful. You don't have to spend weeks or months developing your data model before you can store your first byte. You can just store your data and figure out the model later.

But we also saw a lot of folks -- especially those working more exclusively in software development environments -- who had never worked on data projects per se that were making the typical mistake. They would just throw the data in, without any thought to its schema, and then find later that their queries didn't perform, that they were missing data, that they didn't have keys for identifying subsets of the data that were important. The results could be tragic.

Data projects that start with data models usually fare better than those that don't. That observation remains true even in the NoSQL world. The big difference with NoSQL is that, since the database doesn't force you to do a model, it's more likely developers will just throw data in and then struggle to make things work later on.

Still, you've remarked that NoSQL expands the toolbox for application builders in some ways, and that relational DBMSes have some limitations.

Hills: Sure, if your only physical implementation platform is an RDBMS, there are limitations. One is that you are forced to store any repeating data in its own table and you are forced to give it a key whether the key has logical significance or not. But the fact is that the relational model of data is poorly understood -- in reality, it's about how to think about data, not about how you store data. Some things about the relational model are just not taught and are not well-known.

Give us an idea of what we can expect to learn when we read NoSQL and SQL Data Modeling. Why do we need a new notation?

Hills: Each of the existing notations has built-in assumptions that make it really hard to think about fundamental data issues. I felt that we had to start from scratch. One of the things the new model makes evident is that decisions about keys are sometimes logical and sometimes physical. I, and others, have discovered that, when you have to make a decision about a key, it is really interesting to see if that decision needs to be expressed in the logical model or should only be expressed in the physical model. In the ER notation, it is often needed simply because that model doesn't allow you to have nested types or arrays. But sometimes you need that key for logical reasons -- and then it belongs in the logical model. But no other notation can allow you to have the key in one or the other [levels]. That wasn't possible before COMN.

Next Steps

Discover new approaches for NoSQL data modeling

Learn about key criteria for NoSQL database selection

Listen to a podcast on NoSQL and big data analytics challenges

Dig Deeper on Data modeling tools and techniques