MongoDB as pure In-memory

October 22, 2014 · by dbversity · in MongoDB

MongoDB as an pure in-memory database, meaning that the data is not stored on disk at all. This can be super useful for applications like:

A write-heavy cache in front of a slower RDBMS system
Embedded systems
PCI compliant systems where no data should be persisted
Unit testing where the database should be light and easily cleaned
That would be really neat indeed if it was possible: one could leverage the advanced querying / indexing capabilities of MongoDB without hitting the disk. As you probably know the disk IO (especially random) is the system bottleneck in 99% of cases, and if you are writing data you cannot avoid hitting the disk.

One nice design choice of MongoDB is that it uses memory-mapped files to handle access to data files on disk. 
This means that MongoDB does not know the difference between RAM and disk, it just accesses bytes at offsets in giant arrays representing files and the OS takes care of the rest! It is this design decision that allows MongoDB to run in RAM with no modification.

How it is done

This is all achieved by using a special type of filesystem called tmpfs. Linux will make it appear as a regular FS but it is entirely located in RAM (unless it is larger than RAM in which case it can swap, which can be useful!). I have 32GB RAM on this server, let’s create a 16GB tmpfs:
 
# mkdir /ramdata
# mount -t tmpfs -o size=16000M tmpfs /ramdata/
# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/xvde1 5905712 4973924 871792 86% /
none 15344936 0 15344936 0% /dev/shm
tmpfs 16384000 0 16384000 0% /ramdata

Now let’s start MongoDB with the appropriate settings. smallfiles and noprealloc should be used to reduce the amount of RAM wasted, and will not affect performance since it’s all RAM based. nojournalshould be used since it does not make sense to have a journal in this context!

dbpath=/ramdata
nojournal = true
smallFiles = true
noprealloc = true

After starting MongoDB, you will find that it works just fine and the files are as expected in the FS:

# mongo
MongoDB shell version: 2.3.2
connecting to: test
> db.test.insert({a:1})
> db.test.find()
{ "_id" : ObjectId("51802115eafa5d80b5d2c145"), "a" : 1 }
 
# ls -l /ramdata/
total 65684
-rw-------. 1 root root 16777216 Apr 30 15:52 local.0
-rw-------. 1 root root 16777216 Apr 30 15:52 local.ns
-rwxr-xr-x. 1 root root 5 Apr 30 15:52 mongod.lock
-rw-------. 1 root root 16777216 Apr 30 15:52 test.0
-rw-------. 1 root root 16777216 Apr 30 15:52 test.ns
drwxr-xr-x. 2 root root 40 Apr 30 15:52 _tmp

Now let’s add some data and make sure it behaves properly. We will create a 1KB document and add 4 million of them:

> str = ""
 
> aaa = "aaaaaaaaaa"
aaaaaaaaaa
> for (var i = 0; i < 100; ++i) { str += aaa; }
 
> for (var i = 0; i < 4000000; ++i) { db.foo.insert({a: Math.random(), s: str});}
> db.foo.stats()
{
 "ns" : "test.foo",
 "count" : 4000000,
 "size" : 4544000160,
"avgObjSize" : 1136.00004,
"storageSize" : 5030768544,
"numExtents" : 26,
 "nindexes" : 1,
"lastExtentSize" : 536600560,
"paddingFactor" : 1,
"systemFlags" : 1,
 "userFlags" : 0,
"totalIndexSize" : 129794000,
"indexSizes" : {
"_id_" : 129794000
 },
 "ok" : 1
}

The document average size is 1136 bytes and it takes up about 5GB of storage. The index on _id takes about 130MB. Now we need to verify something very important: is the data duplicated in RAM, existing both within MongoDB and the filesystem? Remember that MongoDB does not buffer any data within its own process, instead data is cached in the FS cache. Let’s drop the FS cache and see what is in RAM:

# echo 3 > /proc/sys/vm/drop_caches
# free
 total used free shared buffers cached
Mem: 30689876 6292780 24397096 0 1044 5817368
-/+ buffers/cache: 474368 30215508
Swap: 0 0 0

As you can see there is 6.3GB of used RAM of which 5.8GB is in FS cache (buffers). Why is there still 5.8GB of FS cache even after all caches were dropped?? The reason is that Linux is smart and it does not duplicate the pages between tmpfs and its cache.
That means your data exists with a single copy in RAM. Let’s access all documents and verify RAM usage is unchanged:
 
> db.foo.find().itcount()
4000000
 
# free
 total used free shared buffers cached
Mem: 30689876 6327988 24361888 0 1324 5818012
-/+ buffers/cache: 508652 30181224
Swap: 0 0 0
# ls -l /ramdata/
total 5808780
-rw-------. 1 root root 16777216 Apr 30 15:52 local.0
-rw-------. 1 root root 16777216 Apr 30 15:52 local.ns
-rwxr-xr-x. 1 root root 5 Apr 30 15:52 mongod.lock
-rw-------. 1 root root 16777216 Apr 30 16:00 test.0
-rw-------. 1 root root 33554432 Apr 30 16:00 test.1
-rw-------. 1 root root 536608768 Apr 30 16:02 test.10
-rw-------. 1 root root 536608768 Apr 30 16:03 test.11
-rw-------. 1 root root 536608768 Apr 30 16:03 test.12
-rw-------. 1 root root 536608768 Apr 30 16:04 test.13
-rw-------. 1 root root 536608768 Apr 30 16:04 test.14
-rw-------. 1 root root 67108864 Apr 30 16:00 test.2
-rw-------. 1 root root 134217728 Apr 30 16:00 test.3
-rw-------. 1 root root 268435456 Apr 30 16:00 test.4
-rw-------. 1 root root 536608768 Apr 30 16:01 test.5
-rw-------. 1 root root 536608768 Apr 30 16:01 test.6
-rw-------. 1 root root 536608768 Apr 30 16:04 test.7
-rw-------. 1 root root 536608768 Apr 30 16:03 test.8
-rw-------. 1 root root 536608768 Apr 30 16:02 test.9
-rw-------. 1 root root 16777216 Apr 30 15:52 test.ns
drwxr-xr-x. 2 root root 40 Apr 30 16:04 _tmp
# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/xvde1 5905712 4973960 871756 86% /
none 15344936 0 15344936 0% /dev/shm
tmpfs 16384000 5808780 10575220 36% /ramdata

And that verifies it!

What about replication?

You probably want to use replication since a server loses its RAM data upon reboot! Using a standard replica set you will get automatic failover and more read capacity. If a server is rebooted MongoDB will automatically rebuild its data by pulling it from another server in the same replica set (resync). This should be fast enough even in cases with a lot of data and indices since all operations are RAM only

It is important to remember that write operations get written to a special collection called oplog which resides in the local database and takes 5% of the volume by default. In my case the oplog would take 5% of 16GB which is 800MB. In doubt, it is safer to choose a fixed oplog size using the oplogSize option. If a secondary server is down for a longer time than the oplog contains, it will have to be resynced. To set it to 1GB, use:
oplogSize = 1000

What about sharding?

Now that you have all the querying capabilities of MongoDB, what if you want to implement a large service with it? Well you can use sharding freely to implement a large scalable in-memory store.
Still the config servers (that contain the chunk distribution) should be disk based since their activity is small and rebuilding a cluster from scratch is not fun.
What to watch for

RAM is a scarce resource, and in this case you definitely want the entire data set to fit in RAM. Even though tmpfs can resort to swapping the performance would drop dramatically. To make best use of the RAM you should consider:

usePowerOf2Sizes

run a compact command or resync the node periodically.
use a schema design that is fairly normalized (avoid large document growth)
Conclusion

You can now use MongoDB and all its features as an in-memory RAM-only store!
Its performance should be pretty impressive: during the test with a single thread / core I was achieving 20k writes per second, and it should scale linearly over the number of cores.

Tags: MongoDB as pure In-memory, MongoDB In-memory, MongoDB with RAM, Pure In-Memory, usePowerOf2Sizes

MongoDB as pure In-memory

2 Responses

Leave a Reply

Categories

Categories