MongoDB Diagnostic Tools/Utilities


MongoDb Inc. currently don't have one-tool-for-any-troubleshooting-scenario. 
However, they have a number of tools that help with different aspects of trouble shooting. 

The best practice is (like in many other Monitoring/Troubleshooting apps) to have a holistic (NOC) view on overall system health and performance, which is MMS. 
This is supplemented by specialist tools for concrete diagnostics like mdiag & mtools. 

a) mongod/mongos logs
The logs can be in different locations on different machines. They have mtools for analysing them once they've been gathered.
http://blog.mongodb.org/post/85123256973/introducing-mtools

b) server load info

mdiag script gathers this kind of information. They've recently made it publicly available.
https://github.com/mongodb/support-tools/raw/master/mdiag/mdiag.sh
You can also configure MMS to integrate with munin-node.

c) system configurations
mdiag also gathers this kind of information, see above

d) currentOps 
They have an internal ruby script that some support engineers use but They don't provide it to customers currently.

e) Long running queries & etc
See mtools above. It can find, aggregate & visualize slow queries in logs.

In addition to the above you'll also find some useful suggestions at:
http://docs.mongodb.org/manual/faq/diagnostics/
http://www.mongodb.com/presentations/diagnostics-and-debugging


Below MongoDB Diagnostic tools/utilities are usually used for trouble shooting the MongoDB productions issues.
 
Prerequisites for testing :-
 
Bulk inserts to create write load on the database.
 
[root @ hostname /opt/mongodb/bin]# ./mongo hostname:27010/admin -u adm -p pwd
MongoDB shell version: 2.4.11
connecting to: hostname:27010/admin
>
> use db
switched to db db
>
> for(var i = 1; i <= 10000000 ; i++){db.test_collection.insert({"_id" : i , "title" : "How do I create manual workload i.e., Bulk inserts to Collection ", " Iteration no:" : i }); sleep(1);}
 
 
MONGOTOP :-
 
mongotop provides a method to track the amount of time a MongoDB instance spends reading and writing data. mongotop provides statistics on a per-collection level.
By default, mongotop returns values every second.
 
Usage :
/opt/mongodb/bin]#./mongotop --host <hostname><:port> -u <username> -p <password> <sleeptime> --locks
 
Example :-
[root @ hostname /opt/mongodb/bin]# ./mongotop -h hostname:27010 -u adm -p pwd 300
connected to: hostname:27010
 
 ns total read write 2014-10-16T06:25:06
 db.test_collection 323ms 2ms 321ms
 MONGODB.system.indexes 0ms 0ms 0ms
 MONGODB.system.namespaces 0ms 0ms 0ms
 MONGODB.system.users 0ms 0ms 0ms
 admin.citi 0ms 0ms 0ms
 admin.newcol 0ms 0ms 0ms
 admin.system.indexes 0ms 0ms 0ms
 
 
 
Fields :-
 
mongotop returns time values specified in milliseconds (ms.)
mongotop only reports active namespaces or databases, depending on the --locks option.
If you don’t see a database or collection, it has received no recent activity.
 
mongotop.ns
Contains the database namespace, which combines the database name and collection.
 
mongotop.db
Contains the name of the database. The database named . refers to the global lock, rather than a specific database.
This field does not appear unless you have invoked mongotop with the --locks option.
 
mongotop.total
Provides the total amount of time that this mongod spent operating on this namespace.
 
mongotop.read
Provides the amount of time that this mongod spent performing read operations on this namespace.
 
mongotop.write
Provides the amount of time that this mongod spent performing write operations on this namespace.
 
mongotop.<timestamp>
Provides a time stamp for the returned data.


 
MONGOSTAT :-
 
The mongostat utility provides a quick overview of the status of a currently running mongod or mongos instance. 
MONGOSTAT is functionally similar to the UNIX/Linux file system utility vmstat, but provides data regarding mongod and mongos instances.
 
Usage :
/opt/mongodb/bin]#./mongotop --host <hostname><:port> -u <username> -p <password> --rowcount <number of rows want to display> <sleeptime in sec> 
 

[Lab root @ 12d4-dl585-04 /opt/mongodb/bin]# ./mongostat -h 12d4-dl585-04:27010 -u adm -p pwd 10
connected to: 12d4-dl585-04:27010

insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time 
 858 *0 *0 *0 0 0|0 0 992m 2.26g 364m 0 .:0.2% 0 0|0 0|0 138k 346b 5 02:57:06 
 861 *0 *0 *0 0 0|0 0 992m 2.26g 366m 0 .:0.1% 0 0|0 0|0 138k 346b 5 02:57:16 
 860 *0 *0 *0 0 0|0 0 992m 2.26g 367m 0 .:0.2% 0 0|0 0|0 138k 346b 5 02:57:26 
 861 *0 *0 *0 0 0|0 0 992m 2.26g 368m 0 .:0.2% 0 0|0 0|0 138k 346b 5 02:57:36 
 861 *0 *0 *0 0 0|0 0 992m 2.26g 370m 0 .:0.1% 0 0|0 0|1 138k 346b 5 02:57:46 
 860 *0 *0 *0 0 0|0 0 992m 2.26g 371m 0 .:0.2% 0 0|0 0|0 138k 346b 5 02:57:56 
 860 *0 *0 *0 0 0|0 0 992m 2.26g 373m 0 .:0.2% 0 0|0 0|0 138k 346b 5 02:58:06 
^C
[root @ hostname /opt/mongodb/bin]# 
[root @ hostname /opt/mongodb/bin]# ./mongostat -h hostname:27010 -u adm -p pwd --discover
connected to: hostname:27010

 insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time 
hostname:27010 862 6 *0 *0 0 1|0 0 992m 2.26g 376m 1 .:0.2% 0 0|0 0|0 138k 3k 6 02:58:26 

hostname:27010 861 *0 *0 *0 0 1|0 0 992m 2.26g 376m 0 .:0.0% 0 0|0 0|0 138k 3k 6 02:58:27 

hostname:27010 859 *0 *0 *0 0 1|0 0 992m 2.26g 376m 0 .:0.2% 0 0|0 0|0 138k 3k 6 02:58:28 

hostname:27010 860 *0 *0 *0 0 1|0 0 992m 2.26g 376m 1 .:0.1% 0 0|0 0|0 138k 3k 6 02:58:29 
^C
[root @ hostname /opt/mongodb/bin]# ./mongostat -h hostname:27010 -u adm -p pwd --rowcount 3 3
connected to: hostname:27010
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time 
 856 *0 *0 *0 0 0|0 0 992m 2.26g 382m 0 .:0.1% 0 0|0 0|0 137k 1k 5 02:59:12 
 856 *0 *0 *0 0 0|0 0 992m 2.26g 383m 0 .:0.2% 0 0|0 0|0 137k 1k 5 02:59:15 
 858 *0 *0 *0 0 0|0 0 992m 2.26g 383m 0 .:0.2% 0 0|0 0|0 138k 1k 5 02:59:18 
[root @ hostname /opt/mongodb/bin]#./mongostat -h hostname:27010 -u adm -p pwd -n 2 3
connected to: hostname:27010
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time 
 857 *0 *0 *0 0 0|0 0 992m 2.26g 413m 0 .:0.2% 0 0|0 0|0 138k 1k 5 03:02:51 
 857 *0 *0 *0 0 0|0 0 992m 2.26g 414m 0 .:0.1% 0 0|0 0|0 138k 1k 5 03:02:54 
[root @ hostname /opt/mongodb/bin]


OPTIONS :-

--noheaders
Disables the output of column or field names.

--rowcount <number>, -n
Controls the number of rows to output. Use in conjunction with the sleeptime argument to control the duration of a mongostat operation.

Unless --rowcount is specified, mongostat will return an infinite number of rows (e.g. value of 0.)

--http
Configures mongostat to collect data using the HTTP interface rather than a raw database connection.

--discover
Discovers and reports on statistics from all members of a replica set or sharded cluster. When connected to any member of a replica set, --discover all non-hidden members of the replica set. When connected to a mongos, mongostat will return data from all shards in the cluster. If a replica set provides a shard in the sharded cluster, mongostat will report on non-hidden members of that replica set.

The mongostat --host option is not required but potentially useful in this case.

Changed in version 2.6: When running with --discover, mongostat now respects :option:–rowcount`.

--all
Configures mongostat to return all optional fields.

<sleeptime>
The final argument is the length of time, in seconds, that mongostat waits in between calls. By default mongostat returns one call every second.
mongostat returns values that reflect the operations over a 1 second period. For values of <sleeptime> greater than 1, mongostat averages data to reflect average operations per second.


FIELDS

mongostat returns values that reflect the operations over a 1 second period. 
When mongostat <sleeptime> has a value greater than 1, mongostat averages the statistics to reflect average operations per second.

mongostat outputs the following fields:

inserts
The number of objects inserted into the database per second. If followed by an asterisk (e.g. *), the datum refers to a replicated operation.

query
The number of query operations per second.

update
The number of update operations per second.

delete
The number of delete operations per second.

getmore
The number of get more (i.e. cursor batch) operations per second.

command
The number of commands per second. On slave and secondary systems, mongostat presents two values separated by a pipe character (e.g. |), in the form of local|replicated commands.

flushes
The number of fsync operations per second.

mapped
The total amount of data mapped in megabytes. This is the total data size at the time of the last mongostat call.

size
The amount of virtual memory in megabytes used by the process at the time of the last mongostat call.

non-mapped
The total amount of virtual memory excluding all mapped memory at the time of the last mongostat call.

res
The amount of resident memory in megabytes used by the process at the time of the last mongostat call.

faults
The number of page faults per second.

locked
The percent of time in a global write lock.

locked db

The percent of time in the per-database context-specific lock. mongostat will report the database that has spent the most time since the last mongostat call with a write lock.

This value represents the amount of time that the listed database spent in a locked state combined with the time that the mongod spent in the global lock. Because of this, and the sampling method, you may see some values greater than 100%.

idx miss
The percent of index access attempts that required a page fault to load a btree node. This is a sampled value.

qr
The length of the queue of clients waiting to read data from the MongoDB instance.

qw
The length of the queue of clients waiting to write data from the MongoDB instance.

ar
The number of active clients performing read operations.

aw
The number of active clients performing write operations.

netIn
The amount of network traffic, in bytes, received by the MongoDB instance.

This includes traffic from mongostat itself.

netOut
The amount of network traffic, in bytes, sent by the MongoDB instance.

This includes traffic from mongostat itself.

conn
The total number of open connections.

set
The name, if applicable, of the replica set.

repl
The replication status of the member.

Value Replication Type
M master
SEC secondary
REC recovering
UNK unknown
SLV slave
RTR mongos process (“router”)

TOP :-

top is an administrative command that returns usage statistics for each collection. 
top provides amount of time, in microseconds, used and a count of operations for the following event types:

total
readLock
writeLock
queries
getmore
insert
update
remove
commands

Issue the top command against the admin database in the form: { top: 1 }

Example

At the mongo shell prompt, use top with the following evocation:

db.adminCommand("top")

Alternately you can use top as follows:


> 
> db.runCommand({ top: 1})
{
 "totals" : {
 "note" : "all times in microseconds",
 "db" : {
 "total" : {
 "time" : 204,
 "count" : 66
 },
 "readLock" : {
 "time" : 194,
 "count" : 65
 },
 "writeLock" : {
 "time" : 10,
 "count" : 1
 },
 "queries" : {
 "time" : 0,
 "count" : 0
 },
 "getmore" : {
 "time" : 0,
 "count" : 0
 },
 "insert" : {
 "time" : 10,
 "count" : 1
 },
 "update" : {
 "time" : 0,
 "count" : 0
 },
 "remove" : {
 "time" : 0,
 "count" : 0
 },
 "commands" : {
 "time" : 194,
 "count" : 65
 }
 },
 "db.system.indexes" : {
 "total" : {
 "time" : 21166,
 "count" : 357
 },
 "readLock" : {
 "time" : 21166,
 "count" : 357
 },
 "writeLock" : {
 "time" : 0,
 "count" : 0
 },
 "queries" : {
 "time" : 21166,
 "count" : 357
 },
 "getmore" : {
 "time" : 0,
 "count" : 0
 },
 "insert" : {
 "time" : 0,
 "count" : 0
 },
 "update" : {
 "time" : 0,
 "count" : 0
 },
 "remove" : {
 "time" : 0,
 "count" : 0
 },
 "commands" : {
 "time" : 0,
 "count" : 0
 }
 },
 "db.system.namespaces" : {
 "total" : {
 "time" : 379,
 "count" : 6
 },
 "readLock" : {
 "time" : 379,
 "count" : 6
 },
 "writeLock" : {
 "time" : 0,
 "count" : 0
 },
 "queries" : {
 "time" : 378,
 "count" : 5
 },
 "getmore" : {
 "time" : 0,
 "count" : 0
 },
 "insert" : {
 "time" : 0,
 "count" : 0
 },
 "update" : {
 "time" : 0,
 "count" : 0
 },
 "remove" : {
 "time" : 0,
 "count" : 0
 },
 "commands" : {
 "time" : 1,
 "count" : 1
 }
 },
 "db.system.users" : {
 "total" : {
 "time" : 15,
 "count" : 1
 },
 "readLock" : {
 "time" : 0,
 "count" : 0
 },
 "writeLock" : {
 "time" : 15,
 "count" : 1
 },
 "queries" : {
 "time" : 0,
 "count" : 0
 },
 "getmore" : {
 "time" : 0,
 "count" : 0
 },
 "insert" : {
 "time" : 15,
 "count" : 1
 },
 "update" : {
 "time" : 0,
 "count" : 0
 },
 "remove" : {
 "time" : 0,
 "count" : 0
 },
 "commands" : {
 "time" : 0,
 "count" : 0
 }
 
 },
 "ok" : 1
}
>


  • Ask Question