JNT Visual - Fotolia

MongoDB founder on dynamic schemas and caching as a crutch

The train to NoSQL rode the rails of agility and scalability, according to Dwight Merriman, MongoDB co-founder.

In just a few years, MongoDB NoSQL software has assumed a prominent role in big data. One reason is that it was built from the ground up for horizontal scale-out and parallelism, according to Dwight Merriman, company chairman and co-founder. He saw the need for such parallelism in an earlier notable stay at online ad giant DoubleClick, which he also co-founded. Here, he speaks on MongoDB's roots, and where it is going.

When MongoDB came about in 2007 and Web apps were in full swing, Agile software development was also on the rise. That meant more dynamic data schemas. Is that right?

Dwight Merriman: Well, if you look at the way we write code today, we are not talking about waterfall lifecycle management anymore -- we are doing agile development. We are talking about lots of iterations, lots of really small releases. We have a release each day; then, we change it. The product manager says, 'No, that is not exactly what I wanted,' and we change it yet again.

Dwight MerrimanDwight Merriman

This notion of iteration has interesting implications for the database and data layer. If you had a new schema migration every day, that would be painful. But, if we have something fluid in terms of what is being stored, that fits really well with this notion of iteration. That has been a nice dovetail for us -- because of MongoDB's dynamic nature in terms of schema.

Was it the case that the established databases didn't scale effectively on the Web? Was scalability the big design criterion in the database's creation?

Merriman: I think of MongoDB as an operational database. A common case is that somebody is writing an application and it is the backing store behind it. It's like OLTP [online transaction processing] with a lower-case 'T'. By that I mean we don't have big transactions.

In MongoDB, you do not have complex transactional semantics. But you can do atomic transactions within the scope of a single document. There are some strong consistency notions in MongoDB. And that very much was intentional.

But we did it the way we did because we wanted very much to scale horizontally with smaller machines -- not to scale vertically with bigger and bigger machines.

When you invented this together with your colleagues, caching was coming into wider use for the Web. Did you try to bring anything in along those lines in your database creation?

Merriman: Well, about the time we started, there was difficulty scaling out. We were seeing the same problems showing up over and over again. Computer architectures were changing. Clock speeds weren't going up. The way we scale these days is different -- it's more through parallelism.

You know, caching is sometimes very valid. But it was becoming a crutch. The database was too slow. For us it was a sign. People had 30 cache servers, each with so much RAM -- why not 30 database servers, each with so much RAM?

Still, highly clustered software is not easy to configure. People want to be like Google, but that can be difficult as the machine pool grows.

Merriman: A big priority now in our R&D is workaround operations -- making it easier for DevOps, DBAs, sys admins -- that sphere of IT. You need automation because there are just so many machines. So we are writing the MongoDB Service Suite, which has monitoring capabilities, backup software and automation software for deployment.

Jack Vaughan is SearchDataManagement's news and site editor. Email him at [email protected], and follow us on Twitter: @sDataManagement.

Next Steps

Discover why NoSQL vs. relational is not a winner-take-all game

Read the news from MongoDB World 2014

Dig Deeper on Database management system (DBMS) software and technology