Warakorn - Fotolia
NoSQL databases are designed to address processing issues created by expanding data volumes and diversity, particularly in big data applications. But there's no lack of either volume or diversity in the ranks of NoSQL technologies, leaving IT and data managers with lots of alternatives to sort through when evaluating technology options.
"There are so many NoSQL databases today -- I think we're challenged by two or three on a daily basis," quipped Michael Simone, global head of CitiData platform engineering at Citigroup Inc., during a presentation at the 2014 MongoDB World conference in New York. In reality, Citi currently has limited itself to using the MongoDB database as a NoSQL alternative to relational software in a small number of applications, Simone said. But his joke pointed to the need for organizations considering NoSQL products to focus on finding the one that can best solve their application problems.
That starts with understanding the different types of NoSQL databases, which are broken down into four primary categories: document databases, key-value stores, wide column stores and graph databases. They all share some common traits -- most notably, support for more flexible and dynamic database designs than are feasible in SQL-based relational databases. But each NoSQL database type is suited to particular uses, according to Gartner Inc. analyst Nick Heudecker. In figuring out which way to go, he said, "you should ask yourself what kind of data you're working with and how your applications are going to use that data."
Document databases support structural mix
For example, document databases are often used in content management systems and to collect and process data from high-volume Web and mobile applications for uses such as application monitoring. Befitting their name, these databases store data elements in document-like structures, which can be simple sometimes to the point of being schema-less. MongoDB, CouchDB, Couchbase Server and MarkLogic are prominent examples of document databases.
Simone said Citi's use of MongoDB originated with application developers who were looking for a way to deal with data replication problems in an online financial application with a variety of data structures. The application was initially deployed on a relational database, but processing the data with that platform was slow and prone to errors. "It became clear that we couldn't keep up with all the data formats coming from the data scientists," he said.
MongoDB's support for dynamic schemas turned out to be a good fit for the rapidly evolving application, according to Simone. "We found that we could model everything that came at us," he said. The modeling work also could be done much faster than with the relational approach: The developers built a pre-production model on MongoDB in just four months.
Key-value databases keep it simple
Key-value databases, such as Aerospike, Redis and Riak, are the simplest form of NoSQL software; they pair unique keys with their associated values in data elements, with a goal of enabling ultrafast application performance against relatively simple data sets. "Key-value stores are incredibly lightweight," said Joe Caserta, president of consulting and technical services provider Caserta Concepts LLC in New York. "We can do lookups in seconds."
Flywheel Software Inc. uses Riak, developed by Basho Technologies Inc., to run a mobile app that lets users hail taxis by tapping on their smartphones. Cuyler Jones, former chief architect at Flywheel, said while he was still at the Redwood City, California, company that the database can scale to meet its peak processing needs. Just as important is Riak's high-availability nature and support for consistent data access times, added Jones, who now works at another startup.
Wide column stores take broad approach
Wide column stores keep data in tables that can have very large numbers of columns, offering the opportunity for high levels of performance and scalability in processing large data sets. Favored uses include Internet search and other large-scale Web applications as well as petabyte-level analytics apps; Accumulo, Cassandra and HBase are among the databases in the wide-column category.
The column-based approach was a good match for a DNA matching application launched in 2012 by Ancestry.com, according to Jeremy Pollack, a development manager at the online provider of family history data. The Provo, Utah, company uses HBase along with Hadoop to run DNA calculations that help customers trace their ethnic backgrounds and geographic origins and look for unknown relatives.
Getting the desired performance from the database required considerable tuning and tweaking, said Pollack, who described HBase programing as a "wonky" process. "There are a million buttons you can dial or tune," he said. "You have to be willing to get your hands dirty." But the NoSQL technology enables Ancestry to rapidly compare 700,000 data points in new and stored DNA samples to look for matching characteristics.
Graph databases track data relationships
Graph databases, including InfiniteGraph and Neo4j, store related data elements in graph-like structures that exploit their associative qualities to power applications such as recommendation engines and social networks. For example, graph technology can be used to map the relationships between different people as well as their interests, said Alex Trofymenko, head of technology at HealthUnlocked, a London-based company that operates a website supporting user forums on various medical topics.
Trofymenko and his team use Neo4j, from Neo Technology Inc., to do such mappings. "We can get a lot of information in a graph database," he said. "Say a user is very interested in diabetes or exercise -- you see it." That's valuable for a site that seeks to take millions of free-text searches, relate them to relevant health terms and build a data platform that helps users find information about possible treatment and assistance.
With the various options that the emergence of NoSQL technologies has added, the database selection process is very different than it was just a few years ago, when, in Caserta's words, "You asked, 'Should I go with Microsoft, Oracle or IBM?'" The wider array of choices can be a good thing for user organizations -- as long as they manage the process carefully and avoid going down the wrong database path.
Find out why NoSQL DBs are like 'horses for courses'
Catch up on the NoSQL buzz