MongoDB WT with ext4 Filesystem and its performance issues.
Currently MongoDB Inc., is lacking in a comprehensive explanation of the issues that can arise with WT when running on EXT4.
Specifically, details on what the issue is, under what conditions it can manifest, and what the symptoms are.
They’ve similar pages explaining how NUMA or low ulimits can negatively impact a cluster, and therefore it makes sense to have a page covering EXT4’s issue in a bit more detail.
Particularly since it is a widely accepted filesystem and the issues will only manifest themselves under certain conditions.
- YCSB 30M documents, 10 fields, ~1kB/document, total ~30GB
- 50/50 read/update workload
- 40 GB cache, 128 GB memory, 32 CPUs
- slow SSD disk (~80-100 MB/s)
- no journal (to simplify the situation)
- per mongostat, cache is at 100% utilization, 80% dirty pretty much throughout the test.
During each checkpoint two calls to fdatasync are made. Because this scenario is i/o constrained the fdatasyncs take a substantial amount of time, and during both fdatasync calls throughput falls to exactly 0 for the duration of the fdatasync. This is seen in A-B, C-D, E-F, G-H, I-J, K-L below.In many, but not all, such cases WT bumps the “eviction server unable to reach goal” counter.
- mounting the EXT4 filesystem with a large commit interval (-o commit=largenumber) to effectively disable periodic journal commits. Note: this is safe for mongod because it ensures durability by doing fdatasyncs at appropriate times, but it dangerous with regard durability for typical programs that might happen to be using the same filesystem.
However, below are few queries related ext4 file system.
MongoDB deployments identified that for ext4 filesystems have 5% reserved for root users.
The filesystem for MongoDB files are typically very large, hence it is reserving a lot of space for multi-TB filesystems.
Can this percentage be reduced? Do you have suggestions on a minimum percentage?
tune2fs –m NEW_RESERVED_PERCENTAGE
The 5% “reserve” set aside for root serves several purposes, including:
Helps to ensure that system services have some capacity on “system” filesystems like /, /tmp, or /var when ill-behaved users (try to) consume all available storage. This can help prevent failures of critical system services, or even system crashes.
Helps to protect against storage fragmentation which can occur when filesystems begin to near 100% capacity.
In a filesystem that is reserved exclusively for use as mongodb storage, these considerations are somewhat less important:
Your –dbpath is not “system” storage, and the operating system will not crash if it becomes 100% full. (The database, however, will. More about that later.)
Newer filesystems like EXT4 are generally less susceptible to fragmentation that their predecessors. A filesystem that is used exclusively to store mongodb data files will contain a very small number of very large files, and these files will not be created and deleted frequently. As a result, fragmentation should not be a concern. (You might also be able to save some storage by allocating fewer inodes in a situation like this.)
So, on a filesystem used exclusively for mongodb datafile storage, there is no compelling reason not to reduce the amount of storage reserved for “root” users if you choose to. There is no strict reason not to set it to zero, if that strikes your fancy.
There are also potential benefits to leaving your filesystem configuration at the defaults. For example:
The storage reserved for root can be a useful safety valve for your database.
When the filesystem containing the –dbpath directory is allowed to become full (or the database is otherwise unable to create new files), the database will usually crash, and sometimes you fill find that you cannot restart the database wilout first making more storage available.
If you are not running mongod as root (and you should never run your mongod as root) then you can use this reserved pool of storage to recover from a failure when you need to.
Building systems whose “normal” operation depends on non-default filesystem configurations can sometimes be a problem during backup and recovery.
People often record the names of filesystems, mount options, etc., to allow them to rebuild a new system from “bare metal”, but optional settings used at filesystem creation time are often omitted. When preparing a system for “bare metal” recovery, you might find that you don’t have enough available storage to complete the recovery even though the replacement hardware and filesystems appear to have been identically configured to the original.
Overall, you definitely can reduce the storage reserved on EXT4 filesystems, at least in the case of a filesystem reserved exclusive for use a mongodb data file storage.
You might, however, choose not to; at the very least, the default 5% storage reserve can be a useful “ace” to keep up you sleeve.