MongoDB 3.x WiredTiger with gridFs ? Is it recommendable ?!

If you’re using GridFS to store large documents such as images and videos, MongoDB automatically breaks the large files into many smaller “chunks” and reassembles them when requested. The implementation of GridFS maintains two collections: fs.files, which contains the metadata for the large files and their associated chunks, and fs.chunks, which contains the large data broken into 255KB chunks. With images and videos, compression will probably be beneficial for the fs.files collection, but the data contained in fs.chunks is probably already compressed, and so it may make sense to disable compression for this collection.

It is entirely supported to implement gridFS on MongoDb 3.0.8 using WiredTiger and compression. However, whether to use compression, and the level of compression to employ, is a question that should be approached with caution.

In cases where gridFS will be used to store large amounts of already-compressed data (for example, MPEG compressed video), customers have encountered very negative results.

Even in cases where compression is effective, it can be extremely CPU intensive, as each document must be compressed before being written to disk, and decompressed every time it is read. The CPU cost can be mitigated in cases where frequently accessed data can be held in the database cache, as compression events will occur much less frequently when disk accesses are limited. Many gridFS use cases, though, will involve very large data sets, where a low “cache hit ratio” should probably be expected. When this is the case, you should expect almost every access to data to result in an IO request, requiring data to be compressed or decompressed.

Under certain conditions, a fairly modest workload might strain the CPU resources of even a very substantially configured host. The intensive reliance on CPU may limit the concurrency of accesses to data, as access request begin to queue up waiting for CPU slices.

While “zlib” compression often provides superior storage savings over “snappy” compression, it is also more demanding of CPU cycles, potentially aggravating the effect described above.

If the cost of storage is a major concern, and your data is reasonably compressible, then the use of WiredTiger with zlib compression may be warranted. It would be advisable, however, to construct a test system and carefully evaluate its behaviour under a realistic dataset and workload before deploying such a solution in production, to ensure that the combination of database and hardware are able to meet your performance expectations.

  • Ask Question