Working with MongoDB Indexes

Indexes are the single biggest tunable performance factor in MongoDB

Absent or suboptimal indexes are the most common avoidable MongoDB performance problem.
// Create an index if one does not exist
db.recipes.ensureIndex ( { main_ingredient : 1 } )

// The client remembers the index and raises no errors
db.recipes.ensureIndex( { main_ingredient : 1 } )

* 1 means ascending, -1 descending

// Multiple fields (compound key indexes)
db.recipes.ensureIndex({
main_ingredient: 1,
calories: -1
})

// Arrays of values (multikey indexes)
{
name: ‘Chicken Noodle Soup’,
ingredients : [ ‘chicken’, ‘noodles’ ]
}

db.recipes.ensureIndex( { ingredients : 1 } )
// Embedded documents
{
name : ‘Apple Pie’,
contributor : {
name : ‘Joe American’,
id : ‘joea123’
}
}

db.recipes.ensureIndex ( { ‘contributor.id’ : 1 } )
db.recipes.ensureIndex ( { ‘contributor’ : 1 } )
// List a collection’s indexes
db.recipes.getIndexes()
db.recipes.getIndexKeys()

// Drop a specific index
db.recipes.dropIndex ( { ingredients : 1 } )

// Drop all indexes and recreate them
db.recipes.reIndex( )

// Default (unique) index on _id

// List a collection’s indexes
db.recipes.getIndexes()
db.recipes.getIndexKeys()

// Drop a specific index
db.recipes.dropIndex ( { ingredients : 1 } )

// Drop all indexes and recreate them
db.recipes.reIndex( )

// Default (unique) index on _id

Options :

Uniqueness constraints (unique, dropDups)
Sparse Indexes
// Only one recipe can have a given value for name
db.recipes.ensureIndex( { name: 1 }, { unique: true } )

// Force index on collection with duplicate recipe names – drop the duplicates
db.recipes.ensureIndex(
{ name: 1 },
{ unique: true, dropDups: true }
)

* dropDups should be used with caution

// Only documents with field calories will be indexed
db.recipes.ensureIndex(
{ calories : -1 },
{ sparse : true }
)
// Allow multiple documents to not have calories field
db.recipes.ensureIndex (
{ name: 1 , calories : -1 },
{ unique : true, sparse : true }
)
* Missing fields are stored as null(s) in the index

Geospatial Indexes (2d Sphere)
Text Indexes
TTL Collections (expireAfterSeconds)
Hashed Indexes for sharding
// Add GeoJSON
{
name: ’MongoDB Palo Alto’, loc: { type : “Point”, coordinates: [ 37.449157 , -122.158574 ] }
}
// Index the coordinates
db.locations.ensureIndex( { loc : ‘2dsphere’ } )

// Query for locations ‘near’ a particular coordinate
db.locations.find ( {
loc: { $near:
{ $geometry:
{ type : “Point”,
coordinates: [ 37.4, -122.3 ] },
$maxDistance: 40 }
}
} )
db.recipes.insert( { _id : 4 , y : “add flour and mix” } );
db.recipes.ensureIndex( { y : “text” } );

Limitations :
Collections can not have > 64 indexes.
Index keys can not be > 1024 bytes (1K).
The name of an index, including the namespace, must be < 125 characters.
Queries can only use 1 index or 2 indexes
Indexes have storage requirements, and impact the performance of writes.
In memory sort (no-index) limited to 32mb of return data.

Optimize Your Queries

The “Explain” Plan (Pre-Index)

db.recipes.find( { calories:
{ $lt : 40 } }
).explain( )
{
“cursor” : “BasicCursor” ,
“n” : 42,
“nscannedObjects” : 12345,
“nscanned” : 12345,

“millis” : 356,

}
* Doesn’t use cached plans, re-evals and resets cache

cursor – the type of cursor used. BasicCursor means no index was used.
n – the number of documents that match the query
nscannedObjects – the number of documents that had to be scanned
nscanned – the number of items (index entries or documents) examined
millis – how long the query took
Ratio of n to nscanned should be as close to 1 as possible.
The “Explain” Plan (Post-Index)
db.recipes.find( { calories:
{ $lt : 40 } }
).explain( )
{
“cursor” : ”BtreeCursor calories_1″ ,
“n” : 42,
“nscannedObjects” : 42,
“nscanned” : 42,

“millis” : 3,

}
* Doesn’t use cached plans, re-evals and resets cache

cursor – the type of cursor used. BasicCursor means no index was used.
n – the number of documents that match the query
nscannedObjects – the number of documents that had to be scanned
nscanned – the number of items (index entries or documents) examined
millis – how long the query took
Ratio of n to nscanned should be as close to 1 as possible.
Profiling Slow Operations
db.setProfilingLevel( n , slowms=100ms )

n=0 profiler off
n=1 record operations longer than slowms
n=2 record all queries

db.system.profile.find()
* The profile collection is a capped collection, and fixed in size
Tha Query Optimizer

For each “type” of query, MongoDB periodically tries all useful indexes
Aborts the rest as soon as one plan wins
The winning plan is temporarily cached for each “type” of query
Manually Select Index to Use
// Tell the database what index to use
db.recipes.find( {
calories: { $lt : 1000 } }
).hint ( { _id : 1 } )

// Tell the database to NOT use an index
db.recipes.find(
{ calories: { $lt : 1000 } }
).hint( { $natural : 1 } )
Use Indexes to Sort Query Results
// Given the following index
db.collection.ensureIndex( { a : 1, b : 1 , c : 1, d : 1 } )

// The following query and sort operations can use the index
db.collection.find( ).sort( { a : 1 } )
db.collection.find( ).sort( { a : 1, b : 1 } )

db.collection.find( { a : 4 } ).sort( { a : 1, b : 1 } )

//This query will not use the index, but will use the index to sort
db.collection.find( { b : 5 } ).sort( { a : 1, b : 1 } )
Indexes that won’t work for sorting query results

// Given the following index
db.collection.ensureIndex( { a : 1, b : 1, c : 1, d : 1 } )

// These can not sort using the index
db.collection.find( ).sort( { b : 1 } )
db.collection.find( { b : 5 } ).sort( { b : 1 } )

Indexes that won’t work for sorting query results
// MongoDB can return data from just the index
db.recipes.ensureIndex({ main_ingredient : 1, name : 1 })

// Return only the ingredients field
db.recipes.find(
{ main_ingredient : ‘chicken’ },
{ _id: 0, name : 1 }
)

// indexOnly will be true in the explain plan
db.recipes.find(
{ main_ingredient : ‘chicken’ },
{ _id: 0, name: 1 }
).explain()
{
“indexOnly” : true,
}
Absent or suboptimal indexes are the most common avoidable MongoDB performance problem.
Trying to Use Multiple Indexes

// MongoDB can only use one index for a query
db.collection.ensureIndex( { a : 1 } )
db.collection.ensureIndex( { b : 1 } )

// Only one of the above indexes is used
db.collection.find( { a : 3, b : 4 } )
Compound Key Mistakes

// MongoDB can only use one index for a query
db.collection.ensureIndex({ a: 1 })
db.collection.ensureIndex({ b: 1 })

// Only one of the above indexes is used
db.collection.find({ a: 3, b: 4 })

// Use a compound index
db.collection.ensureIndex( { a : 1, b : 1 } )

Low Selectivity Indexes
db.collection.distinct(‘status’)
[ ‘new’, ‘processed’ ]

db.collection.ensureIndex( { status : 1 } )

// Low selectivity indexes provide little benefit
db.collection.find( { status : ‘new’ } )

// Better
db.collection.ensureIndex ({ status : 1, created_at : -1 } )
db.collection.find(
{ status : ‘new’ }
).sort( { created_at : -1 } )

Regular Expressions
db.users.ensureIndex( { username : 1 } )

// Left anchored regex queries can use the index
db.users.find( { username : /^joe smith/ } )

// But not generic regexes
db.users.find( {username : /smith/ } )

// Or case insensitive queries
db.users.find( { username : /Joe/i } )

Negation

// Indexes aren’t helpful with negations
db.things.ensureIndex( { x : 1 } )

// e.g. “not equal” queries
db.things.find( { x : { $ne : 3 } } )

// …or “not in” queries
db.things.find( { x : { $nin : [2, 3, 4 ] } } )

// …or the $not operator
db.people.find( { name: { $not : ‘John Doe’ } } )

Choosing the right indexes is one of the most important things you can do as a MongoDB developer. Take indexes into consideration when designing your application.

  • Ask Question