BACKGROUND IMAGE: iSTOCK/GETTY IMAGES
MongoDB is an open source database management system (DBMS) that uses a document-oriented database model which supports various forms of data. It is one of numerous nonrelational database technologies which arose in the mid-2000s under the NoSQL banner for use in big data applications and other processing jobs involving data that doesn't fit well in a rigid relational model. Instead of using tables and rows as in relational databases, the MongoDB architecture is made up of collections and documents.
How it works
Documents, which also must incorporate a primary key as a unique identifier, are the basic unit of data in MongoDB. Collections contain sets of documents and function as the equivalent of relational database tables. Collections can contain any type of data, but the restriction is the data in a collection cannot be spread across different databases.
The BSON document storage and data interchange format used in MongoDB provides a binary representation of JSON-like documents. Automatic sharding is another key feature that enables data in a MongoDB collection to be distributed across multiple systems for horizontal scalability as data volumes and throughput requirements increase.
The NoSQL DBMS uses a single master architecture for data consistency, with secondary databases which maintain copies of the primary database. Operations are automatically replicated to those secondary databases for automatic failover.
MongoDB pros and cons
Like other NoSQL databases, MongoDB doesn't require predefined schemas and it stores any type of data. This gives users the flexibility to create any number of fields in a document, making it easier to scale MongoDB databases compared to relational databases.
One of the advantages of using documents is that these objects map to native data types in a number of programming languages. Also, having embedded documents reduces the need for database joins, which can reduce costs.
A core function of MongoDB is its horizontal scalability, which makes it a useful database for companies running big data applications. In addition, sharding allows the database to distribute data across a cluster of machines. Newer versions of MongoDB also support the creation of zones of data based on a shard key.
MongoDB supports a number of storage engines and provides pluggable storage engine APIs that allow third parties to develop their own storage engines for MongoDB.
The DBMS also has built-in aggregation capabilities, which allow users to run MapReduce code directly on the database, rather than running MapReduce on Hadoop. MongoDB also includes its own file system called GridFS, akin to the Hadoop Distributed File System (HDFS), primarily for storing files larger than BSON's size limit of 16 MB per document. These similarities allow MongoDB to be used instead of Hadoop, though the database software does integrate with Hadoop, Spark and other data processing frameworks.
Though the benefits are many, there are some downsides to MongoDB. With its automatic failover strategy, a user sets up just one master node in a MongoDB cluster. If the master fails, a slave node will automatically convert to the new master. This switch promises continuity, but it isn't instantaneous -- it can take up to a minute. By comparison, the Cassandra NoSQL database supports multiple master nodes so that if one master goes down, another is standing by for a highly available database infrastructure.
MongoDB's single master node also limits how fast data can be written to the database. Data writes must be recorded on the master and writing new information to the database is limited by the capacity of that master node.
Another potential issue is that MongoDB doesn't provide full referential integrity through the use of foreign-key constraints, which could affect data consistency. In addition, user authentication isn't enabled by default in MongoDB databases, a nod to the technology's popularity with developers. However, malicious hackers have targeted large numbers of unsecured MongoDB systems in ransom attacks, which led to the addition of a default setting that blocks networked connections to databases if they haven't been configured by a database administrator.
MongoDB is available in community and commercial versions through vendor MongoDB Inc. MongoDB Community Edition is the open source release, while MongoDB Enterprise Server brings added security features, an in-memory storage engine, administration and authentication features, and monitoring capabilities through Ops Manager.
A graphical user interface (GUI) called MongoDB Compass gives users a way to work with document structure, conduct queries, index data and more. The MongoDB Connector for BI allows users to connect the NoSQL database to their business intelligence tools to visualize data and create reports using SQL queries.
Following in the footsteps of other NoSQL database providers, MongoDB Inc. launched a cloud database as a service called MongoDB Atlas in 2016. Atlas runs on AWS, Microsoft Azure and Google Cloud Platform. More recently, MongoDB released a platform called Stitch for application development on MongoDB Atlas, with plans to extend it to on-premises databases.
The company also added support for multi-document ACID transactions as part of MongoDB 4.0 in 2018. Complying with the ACID properties -- atomicity, consistency, isolation and durability -- across multiple documents expands the types of transactional workloads that MongoDB can handle with guaranteed accuracy and reliability.
MongoDB was created by Dwight Merriman and Eliot Horowitz, who had encountered development and scalability issues with traditional relational database approaches while building web applications at DoubleClick, an online advertising company that is now owned by Google Inc. The name of the database was derived from the word humongous to represent the idea of supporting large amounts of data.
Merriman and Horowitz helped form 10Gen Inc. in 2007 to commercialize MongoDB and related software. The company was renamed MongoDB Inc. in 2013 and went public in October 2017 under the ticker symbol MDB.
The DBMS was released as open source software in 2009 and is available under the terms of Version 3.0 of the Free Software Foundation's GNU Affero General Public License, in addition to the commercial licenses offered by MongoDB Inc.
At the time of this writing, among other users, the insurance company MetLife is using MongoDB for customer service applications, the website Craigslist is using it for archiving data, the CERN physics lab is using it for data aggregation and discovery, and The New York Times is using MongoDB to support a form-building application for photo submissions.
Should you invest in a NoSQL DBMS?