BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
Typically built for speed and designed for specific purposes, NoSQL databases forgo the rigid database schemas...
of SQL-based relational software. As a result, they can be deployed quickly and adapted on the fly; in addition, they can handle data sets with diverse structures and fields, and they run well in distributed modes that often vex mainstream relational databases.
But as they flourish and become part of larger enterprise processes, the issue of NoSQL data modeling may create disorder in some organizations, according to IT professionals and consultants at the Enterprise Data World 2015 conference in Washington, D.C. For example, their support for flexible schemas -- or, as some NoSQL vendors posit, schema-less implementations -- means NoSQL technologies can be put into use without high-level models documenting how they fit into a larger IT picture. Because of that, common data modeling practices likely will have to come in for some rethinking.
Working with NoSQL systems requires "a fundamental mindset change" on the part of data architects who are versed in the ways of relational databases, according to Donovan Hsieh, a senior enterprise data architect at eBay Inc. The San Jose, Calif., company has accumulated more than few varieties of NoSQL products -- Hsieh listed MongoDB, Couchbase and Cassandra among others growing in an eBay data vineyard that also includes relational database varieties such as Oracle, MySQL and Teradata.
The role of the application developer is much more pronounced in the NoSQL design process, Hsieh said in a presentation at the EDW conference. He added that developers creating NoSQL-based applications frequently skip the traditional step of building conceptual and logical data models upfront and focus solely on low-level physical data models incorporated directly into the application logic. That can help boost application performance, scalability and flexibility -- but Hsieh said it can also lead to situations in which development agility trumps database manageability.
Things are further complicated by the fact that NoSQL is an umbrella term covering distinct product categories, with four primary ones: key-value stores, document databases, wide column stores and graph databases. Physical data models vary widely between the different categories and between different databases within each category, Hsieh said.
Modeling musts: Rigor, business input
In pursuit of development rigor to help channel modeling efforts, software design methodologies that are database-agnostic can be useful, he said, citing the Unified Modeling Language and domain-driven design practices among the available means to bring additional processes to bear in NoSQL environments.
Hsieh said application developers who take on responsibility for data model design also need to interact closely with people on the business side to ensure that NoSQL systems will be able to answer the queries they're looking to run. In turn, business managers and other end users have to provide practical requirements that can be used to set realistic query performance goals and plan peak throughput and storage needs. The development and data management teams at eBay have fine-tuned that kind of approach to NoSQL data modeling over the past few years, he noted.
Another important step, in Hsieh's estimation, comes after the development stage. "At the end of the cycle, we ask the developers to retroactively create a data model," he said, referring to the conceptual and logical stages of the modeling process. Without that, other programmers and IT staffers looking to understand how data is modeled in a NoSQL system have to dig into the application code, where the intent of the original developers isn't always readily apparent.
Sometimes data models are the province of individual developers, who may take their design secrets along with them upon leaving the building to go work elsewhere. In such cases, the model is effectively lost. "A relational database can be reverse-engineered. But there's no fixed data model in something like a document database," said Dan Sullivan, an independent consultant who also spoke at EDW.
Building models should be a team sport
The old adage applies: safety in numbers. "As a rule, you need to be sure more than one person is looking at things as you are developing," said Pramod Sadalage, a principal consultant at ThoughtWorks Inc. in Chicago, during another presentation. "Silos don't generate knowledge."
Sadalage thinks the DevOps concept -- an emerging extension of Agile development practices that gives application developers more involvement in and responsibility for software deployment -- can help improve NoSQL usability in organizations. He suggested adopting some basic tenets of the Agile and DevOps approaches, such as implementing a version control system to keep all data elements consistent and pairing up developers and database administrators to work together.
"The DBA, has a lot of institutional knowledge about where the data is, how it arrived and why it is the way it is," Sadalage said. Sharing that knowledge with developers building NoSQL applications can help ensure, for example, that information about how data used by downstream systems is taken into account in designing the database architecture.
The quest for more order in NoSQL data modeling will continue, said Joe Caserta, president of data management consultancy Caserta Concepts LLC in New York. And it has to, he added: "For NoSQL databases to get embraced by the enterprise, we need to put discipline around how we actually use them."
Like Hsieh, Caserta sees knowing in detail what questions business users will want to ask of data as the best way to inform a process in which architects and developers must decide issues such as whether to build narrow or wide database tables. "When you put data into a NoSQL database, you should have a usage pattern in mind," he said. "That will determine how you want to model things, regardless of the data."
Take our quiz and measure your NoSQL DB knowledge
Learn questions to ask as you sort through the NoSQL product maze
Is NoSQL technology truly essential for drawing BI from big data?