MongoDB WT with ext4 Filesystem and its performance issues.

March 18, 2016 · by dbversity · in Linux, MongoDB

Currently MongoDB Inc., is lacking in a comprehensive explanation of the issues that can arise with WT when running on EXT4.

Specifically, details on what the issue is, under what conditions it can manifest, and what the symptoms are.

They’ve similar pages explaining how NUMA or low ulimits can negatively impact a cluster, and therefore it makes sense to have a page covering EXT4’s issue in a bit more detail.

Particularly since it is a widely accepted filesystem and the issues will only manifest themselves under certain conditions.

Description

YCSB 30M documents, 10 fields, ~1kB/document, total ~30GB
50/50 read/update workload
40 GB cache, 128 GB memory, 32 CPUs
slow SSD disk (~80-100 MB/s)
no journal (to simplify the situation)
per mongostat, cache is at 100% utilization, 80% dirty pretty much throughout the test.

During each checkpoint two calls to fdatasync are made. Because this scenario is i/o constrained the fdatasyncs take a substantial amount of time, and during both fdatasync calls throughput falls to exactly 0 for the duration of the fdatasync. This is seen in A-B, C-D, E-F, G-H, I-J, K-L below.In many, but not all, such cases WT bumps the “eviction server unable to reach goal” counter.

count=1000000.png

10 kB

May 15 2015 05:19:33 PM GMT+0300
lz4.png

9 kB

May 14 2015 03:38:15 PM GMT+0300
snappy.png

11 kB

May 14 2015 03:38:15 PM GMT+0300
try-13.png

145 kB

May 04 2015 10:51:54 PM GMT+0300
try-14.png

105 kB

May 05 2015 06:29:42 PM GMT+0300
try-16.png

65 kB

May 07 2015 04:47:28 PM GMT+0300
try-21.png

142 kB

May 12 2015 11:21:22 PM GMT+0300
try-23.png

114 kB

May 13 2015 11:08:32 PM GMT+0300
try-25.png

116 kB

May 13 2015 11:08:32 PM GMT+0300
try-26.png

114 kB

May 13 2015 11:09:12 PM GMT+0300
try-29.png

117 kB

May 13 2015 11:09:12 PM GMT+0300
try-30.png

77 kB

May 21 2015 09:09:17 PM GMT+0300
try-33.png

62 kB

May 21 2015 09:09:17 PM GMT+0300
try-34.png

60 kB

May 22 2015 03:24:27 PM GMT+0300
try-35.png

63 kB

May 22 2015 06:28:32 PM GMT+0300
try-36.png

63 kB

May 22 2015 06:28:32 PM GMT+0300

Activity

Comments

YCSB 50/50, 10M docs 10 fields, 20 GB cache, 20 threads

master indeed seems to fix the issue during the “transaction pinned” phase (~~SERVER-18315~~)
the fdatasync stall (this ticket) can be fixed by in addition either
- using ext4 with -o commit=largenumber mount option, or
- using xfs

3.0.2

shows both ~~SERVER-18315~~ (C-D) and this ticket (B-C, D-E)

MASTER, EXT4

fixes ~~SERVER-18315~~ (C-D)
fdatasync stalls (this ticket, B-C, D-E) remain

MASTER, EXT4, BUT DISABLE EXT4 TIMED JOURNAL COMMITS (-o commit=… option)

fixes fdatasync stalls

MASTER, XFS

also fixes fdatasync stalls

mounting the EXT4 filesystem with a large commit interval (-o commit=largenumber) to effectively disable periodic journal commits. Note: this is safe for mongod because it ensures durability by doing fdatasyncs at appropriate times, but it dangerous with regard durability for typical programs that might happen to be using the same filesystem.

However, below are few queries related ext4 file system.

Question :

MongoDB deployments identified that for ext4 filesystems have 5% reserved for root users.

The filesystem for MongoDB files are typically very large, hence it is reserving a lot of space for multi-TB filesystems.

Can this percentage be reduced? Do you have suggestions on a minimum percentage?

tune2fs –m NEW_RESERVED_PERCENTAGE

Answer :

The 5% “reserve” set aside for root serves several purposes, including:

Helps to ensure that system services have some capacity on “system” filesystems like /, /tmp, or /var when ill-behaved users (try to) consume all available storage. This can help prevent failures of critical system services, or even system crashes.

Helps to protect against storage fragmentation which can occur when filesystems begin to near 100% capacity.

In a filesystem that is reserved exclusively for use as mongodb storage, these considerations are somewhat less important:

Your –dbpath is not “system” storage, and the operating system will not crash if it becomes 100% full. (The database, however, will. More about that later.)

Newer filesystems like EXT4 are generally less susceptible to fragmentation that their predecessors. A filesystem that is used exclusively to store mongodb data files will contain a very small number of very large files, and these files will not be created and deleted frequently. As a result, fragmentation should not be a concern. (You might also be able to save some storage by allocating fewer inodes in a situation like this.)

So, on a filesystem used exclusively for mongodb datafile storage, there is no compelling reason not to reduce the amount of storage reserved for “root” users if you choose to. There is no strict reason not to set it to zero, if that strikes your fancy.

There are also potential benefits to leaving your filesystem configuration at the defaults. For example:

The storage reserved for root can be a useful safety valve for your database.

When the filesystem containing the –dbpath directory is allowed to become full (or the database is otherwise unable to create new files), the database will usually crash, and sometimes you fill find that you cannot restart the database wilout first making more storage available.

If you are not running mongod as root (and you should never run your mongod as root) then you can use this reserved pool of storage to recover from a failure when you need to.

Building systems whose “normal” operation depends on non-default filesystem configurations can sometimes be a problem during backup and recovery.

People often record the names of filesystems, mount options, etc., to allow them to rebuild a new system from “bare metal”, but optional settings used at filesystem creation time are often omitted. When preparing a system for “bare metal” recovery, you might find that you don’t have enough available storage to complete the recovery even though the replacement hardware and filesystems appear to have been identically configured to the original.

Overall, you definitely can reduce the storage reserved on EXT4 filesystems, at least in the case of a filesystem reserved exclusive for use a mongodb data file storage.

You might, however, choose not to; at the very least, the default 5% storage reserve can be a useful “ace” to keep up you sleeve.

MongoDB WT with ext4 Filesystem and its performance issues.

Description

Activity

Leave a Reply

Categories

Categories