In-memory databases best suited for 'relatively small' data sets

How does data set size determine whether an in-memory database is a good fit for applications, especially analytics uses? Mark Whitehorn explains.

What kinds of applications are a good fit for in-memory databases?

Generally speaking, in-memory databases are poorly suited to transactional applications because, as the name suggests, the data is held in memory. If we ran a standard transactional database completely in memory and the system crashed or power was lost, then we would lose transactions. This harkens back to the durability element of the ACID (atomicity, consistency, isolation and durability) properties in transactional systems.

This is beginning to change as we see the emergence of newer ACID-compliant in-memory technologies, such as the In-Memory OLTP option in SQL Server 2014 (formerly code-named Hekaton), SAP's HANA system and Oracle's upcoming add-on for its 12c database. However, analytical applications are still the main forte of in-memory databases because they typically run on a copy of transactional data that has already been safely stored on disk, so the raw data itself is safe in the event of a crash or power failure.

More specifically, in-memory analytics is best suited to applications that involve the analysis of relatively small sets of data because memory is much more expensive than disk.

But it is also worth bearing in mind that "relatively small" indicates a volume of data that can affordably be placed in memory, which can often be a surprising amount of data. For example, I did some calculations for a software vendor recently that demonstrate how "relatively small" can actually be relatively large. The following is an excerpt from my analysis:

To try to provide a sense of perspective, let's assume you want to analyze a single table with 20 columns. Half of the columns are text (limited to, say, 50 characters but with an average length of seven characters), five are dates, three integers and two real numbers.

A 1.5-million-row table saved as a CSV file is 236 MB of data -- about a quarter of a gigabyte. If we double the number of columns to 40 and rows to 3 million, that's about 1 GB. Double those two again, to 80 columns and 6 million rows, and it is still only 4 GB. Most laptops now have 8 GB of RAM, so even without compression, this relatively large number of complex rows can be analyzed comfortably with in-memory databases.

Servers are often fitted with more RAM, and 32 GB of memory can accommodate around 50 million rows of 80-column data. And it gets better because in-memory engines often compress the data -- how much, of course, depends on the data itself.

Dig Deeper on Database management system (DBMS) software and technology