[MongoDB]: Different ways in Purging

There are three ways to do purging in MongoDB.

Single collection, delete old entries
Collection per day, drop old collections
Database per day, drop old databases

Option #1: single collection

pros
====
Easy to implement
Easy to run Map/Reduces

cons
====
Deletes are as expensive as inserts, causes lots of IO and the need to “defragment” or “compact” the DB.
At some point you end up handling double the “writes” as you have to both insert a day’s worth of data and delete a day’s worth of data.

Option #2: collection per day

pros:

Removing data via collection.drop() is very fast.
Still Map/Reduce friendly as the output from each day can be merged or re-reduced against the summary data.

cons:

You may still have some fragmenting problems.
You will need to re-write queries. However, in my experience if you have enough data that you’re purging, you rarely access that data directly. Instead you tend to run Map/Reduces over that data. So this may not change that many queries.

Option #3: database per day

pros

Deletion is as fast as possible, files are simply truncated.
Zero fragmentation problems and easy to backup / restore / archive old data.
cons

Will make querying more challenge (expect to write some wrapper code).
Not as easy to write Map/Reduce’s, though take a look at the Aggregation Framework as that may better satisfy your needs anyways.
There is also option #4, but it is not a general solution.
I know of some people who did “purging” by simply using Capped Collections (What are Capped Collections in MongoDB).
There are definitely cases where this works, but it has a bunch of caveats, so you really need to know what you’re doing.

And finally, we can also set TTL for collection from mongodb 2.2 release or higher. this will help you to expire old data from collection.

Explore the following  expire data tutorial 

You can refer to Disk-size on how do check sizing in MongoDB

  • Ask Question