[MongoDB]: Index Data

Question : –
============================================

I’ve a quick question on Covered/IndexOnly Index – it’s saying query goes to index collection – not to actual collection;
but I don’t see any entries in system.indexes collections except 2 – I’m little confused here. Is there any separate Index collection apart from this one.

In General, where the Index data stores in MongoDB ?

Covered Index or indexOnly :-
==========================
If our command is requesting for values present inside the index, mongo engine will fetch that information from the index collection itself and does not go to the collection where the entire document is present.
Hence the speed of execution of the command is more faster, and this type of queries are very efficient in creating optimized web applications.

Note from the below explain() result, the number of objects scanned in the collection is (nscannedObjects) 0, and the number of scans made on indexes in only (nscanned) 1.
And now you can also observe indexOnly is true.

> db.system.indexes.find()
{ “v” : 1, “key” :
{ “_id” : 1 }
, “name” : “id”, “ns” : “citi.citi_website” }
{ “v” : 1, “key” :
{ “post_id” : 1, “comment_id” : 1 }
, “name” : “post_id_1_comment_id_1”, “ns” : “citi.citi_website” }
>
>

> db.citi_website.find( { post_id : 250 } )
{ “_id” : ObjectId(“5527b4a29689e4b8db19b70c”), “post_id” : 250, “comment_id” : 250, “likes” : 250 }
>
> db.citi_website.find( { post_id : 250 } , { post_id : 1, comment_id : 1 , _id : 0} )
{ “post_id” : 250, “comment_id” : 250 }
>
>
> db.citi_website.find( { post_id : 250 } , { post_id : 1, comment_id : 1 , _id : 0} ).explain()
{
“cursor” : “BtreeCursor post_id_1_comment_id_1”,
“isMultiKey” : false,
“n” : 1,
“nscannedObjects” : 0,
“nscanned” : 1,
“nscannedObjectsAllPlans” : 0,
“nscannedAllPlans” : 1,
“scanAndOrder” : false,
“indexOnly” : true,
“nYields” : 0,
“nChunkSkips” : 0,
“millis” : 0,
“indexBounds” : {
“post_id” : [
[
250,
250
]
],
“comment_id” : [
[

{ “$minElement” : 1 }
,
{ “$maxElement” : 1 }
]
]
},
“server” : “myhostname:27017”,
“filterSet” : false
}
>
>
>
Answer :-
========================================================================================================================

MongoDB Index Data :-

The actual index data is maintained separately and not accessible directly, while the document in the system.indexes shows which indexes are present for the collection.
The index(es) themselves are maintained within the data files. Indexes are stored in linked extents through the data file(s).

Indexes

• Indexes are BTree structures serialized to disk
• They are stored in the same files as data but using own extents

Internal File Format

• Files on disk are broken into extents which contain the documents
• A collection has one or more extents
• Extent grow exponentially up to 2GB
• Namespace entries in the ns file point to the first extent for that collection

The DB Stats

> db.stats()
{
“db” : “test”,
“collections” : 22,
“objects” : 17000383, ## number of documents
“avgObjSize” : 44.33690276272011,
“dataSize” : 753744328, ## size of data
“storageSize” : 1159569408, ## size of all containing
extents
“numExtents” : 81,
“indexes” : 85,
“indexSize” : 624204896, ## separate index storage
size
“fileSize” : 4176478208, ## size of data files on disk
“nsSizeMB” : 16,
“ok” : 1
}
> db.large.stats()
{
“ns” : “test.large”,
“count” : 5000000, ## number of documents
“size” : 280000024, ## size of data
“avgObjSize” : 56.0000048,
“storageSize” : 409206784, ## size of all containing
extents
“numExtents” : 18,
“nindexes” : 1,
“lastExtentSize” : 74846208,
“paddingFactor” : 1, ## amount of padding
“systemFlags” : 0,
“userFlags” : 0,
“totalIndexSize” : 162228192, ## separate index storage
size
“indexSizes” : {
“_id_” : 162228192
},
“ok” : 1
}

 
Memory Mapped Files

• All data files are memory mapped to Virtual Memory by the OS
• MongoDB just reads / writes to RAM in the filesystem cache
• OS takes care of the rest!
• Virtual process size = total files size + overhead (connections, heap)
• If journal is on, the virtual size will be roughly doubled

How Much Data is in RAM?

• Resident memory the best indicator of how much data in RAM
• Resident is: process overhead (connections, heap) + FS pages in RAM that were accessed
• Means that it resets to 0 upon restart even though data is still in RAM due to FS cache
• Use free command to check on FS cache size
• Can be affected by fragmentation and read-ahead A Journey through the MongoDB Internals?

The Problem
Changes in memory mapped files are not applied in order and different parts of the file can be from
different points in time!

You want a consistent point-in-time snapshot when restarting after a crash

Solution – Use a Journal
• Data gets written to a journal before making it to the data files
• Operations written to a journal buffer in RAM that gets flushed every 100ms by default or 100MB
• Once the journal is written to disk, the data is safe
• Journal prevents corruption and allows durability
• Can be turned off, but don’t!

Can I Lose Data on a Hard Crash?
• Maximum data loss is 100ms (journal flush). This can be reduced with –journalCommitInterval
• For durability (data is on disk when ack’ed) use the JOURNAL_SAFE write concern (“j” option).
• Note that replication can reduce the data loss further. Use the REPLICAS_SAFE write concern (“w” option).
• As write guarantees increase, latency increases. To maintain performance, use more connections!

Source & more details at : pdf

  • Ask Question