[MongoDB]: Orphaned documents

Orphaned document

In a sharded cluster, orphaned documents are those documents on a shard that also exist in chunks on other shards as a result of failed migrations or incomplete migration cleanup due to abnormal shutdown. Delete orphaned documents using cleanupOrphaned to reclaim disk space and reduce confusion.

cleanupOrphaned :-

Deletes from a shard the orphaned documents whose shard key values fall into a single or a single contiguous range that do not belong to the shard. For example, if two contiguous ranges do not belong to the shard, the cleanupOrphaned examines both ranges for orphaned documents.

cleanupOrphaned has the following syntax:

db.runCommand( {
cleanupOrphaned: “<database>.<collection>”,
startingAtKey: <minimumShardKeyValue>,
secondaryThrottle: <boolean>,
writeConcern: <document>
} )

cleanupOrphaned has the following fields:

Field Description
cleanupOrphaned The namespace, i.e. both the database and the collection name, of the sharded collection for which to clean the orphaned data.
startingFromKey
Optional. The shard key value that determines the lower bound of the cleanup range. The default value is MinKey.

If the range that contains the specified startingFromKey value belongs to a chunk owned by the shard, cleanupOrphaned continues to examine the next ranges until it finds a range not owned by the shard. See Determine Range for details.

secondaryThrottle
Optional. If true, each delete operation must be replicated to another secondary before the cleanup operation proceeds further. If false, do not wait for replication. Defaults to false.

Independent of the secondaryThrottle setting, after the final delete, cleanupOrphaned waits for all deletes to replicate to a majority of replica set members before returning.

writeConcern
Optional. A document that expresses the write concern that the secondaryThrottle will use to wait for the secondaries when removing orphaned data.

Any specified writeConcern implies _secondaryThrottle.

Behavior :

Run cleanupOrphaned in the admin database directly on the mongod instance that is the primary replica set member of the shard. Do not run cleanupOrphaned on a mongos instance.

You do not need to disable the balancer before running cleanupOrphaned.

Determine Range
The cleanupOrphaned command uses the startingFromKey value, if specified, to determine the start of the range to examine for orphaned document:

If the startingFromKey value falls into a range for a chunk not owned by the shard, cleanupOrphaned begins examining at the start of this range, which may not necessarily be the startingFromKey.
If the startingFromKey value falls into a range for a chunk owned by the shard, cleanupOrphaned moves onto the next range until it finds a range for a chunk not owned by the shard.
The cleanupOrphaned deletes orphaned documents from the start of the determined range and ends at the start of the chunk range that belongs to the shard.

Consider the following key space with documents distributed across Shard A and Shard B.

Shard A owns:

Chunk 1 with the range { x: minKey } –> { x: -75 },
Chunk 2 with the range { x: -75 } –> { x: 25 }, and
Chunk 4 with the range { x: 175 } –> { x: 200 }.
Shard B owns:

Chunk 3 with the range { x: 25 } –> { x: 175 } and
Chunk 5 with the range { x: 200 } –> { x: maxKey }.
If on Shard A, the cleanupOrphaned command runs with startingFromKey: { x: -70 } or any other value belonging to range for Chunk 1 or Chunk 2, the cleanupOrphaned command examines the Chunk 3 range of { x: 25 } –> { x: 175 } to delete orphaned data.

If on Shard B, the cleanupOrphaned command runs with the startingFromKey: { x: -70 } or any other value belonging to range for Chunk 1, the cleanupOrphaned command examines the combined contiguous range for Chunk 1 and Chunk 2, namely { x: minKey } –> { x: 25 } to delete orphaned data.

Required Access :

On systems running with authorization, you must have clusterAdmin privileges to run cleanupOrphaned.

Output

Return Document
Each cleanupOrphaned command returns a document containing a subset of the following fields:

cleanupOrphaned.ok
Equal to 1 on success.

A value of 1 indicates that cleanupOrphaned scanned the specified shard key range, deleted any orphaned documents found in that range, and confirmed that all deletes replicated to a majority of the members of that shard’s replica set. If confirmation does not arrive within 1 hour, cleanupOrphaned times out.

A value of 0 could indicate either of two cases:

cleanupOrphaned found orphaned documents on the shard but could not delete them.
cleanupOrphaned found and deleted orphaned documents, but could not confirm replication before the 1 hour timeout. In this case, replication does occur but only after cleanupOrphaned returns.
cleanupOrphaned.stoppedAtKey
The upper bound of the cleanup range of shard keys. If present, the value corresponds to the lower bound of the next chunk on the shard. The absence of the field signifies that the cleanup range was the uppermost range for the shard.

Log Files :
The cleanupOrphaned command prints the number of deleted documents to the mongod log. For example:

m30000| 2013-10-31T15:17:28.972-0400 [conn1] Deleter starting delete for: foo.bar from { _id: -35.0 } -> { _id: -10.0 }, with opId: 128
m30000| 2013-10-31T15:17:28.972-0400 [conn1] rangeDeleter deleted 0 documents for foo.bar from { _id: -35.0 } -> { _id: -10.0 } { “stoppedAtKey”: { “_id”: -10 }, “ok”: 1 }

Examples

The following examples run the cleanupOrphaned command directly on the primary of the shard.

Remove Orphaned Documents for a Specific Range
For a sharded collection info in the test database, a shard owns a single chunk with the range: { x: MinKey } –> { x: 10 }.

The shard also contains documents whose shard keys values fall in a range for a chunk not owned by the shard: { x: 10 } –> { x: MaxKey }.

To remove orphaned documents within the { x: 10 } => { x: MaxKey } range, you can specify a startingFromKey with a value that falls into this range, as in the following example:

use admin
db.runCommand( {
“cleanupOrphaned”: “test.info”,
“startingFromKey”: { x: 10 },
“secondaryThrottle”: true
} )
Or you can specify a startingFromKey with a value that falls into the previous range, as in the following:

use admin
db.runCommand( {
“cleanupOrphaned”: “test.info”,
“startingFromKey”: { x: 2 },
“secondaryThrottle”: true
} )
Since { x: 2 } falls into a range that belongs to a chunk owned by the shard, cleanupOrphaned examines the next range to find a range not owned by the shard, in this case { x: 10 } => { x: MaxKey }.

Remove All Orphaned Documents from a Shard
cleanupOrphaned examines documents from a single contiguous range of shard keys. To remove all orphaned documents from the shard, you can run cleanupOrphaned in a loop, using the returned stoppedAtKey as the next startingFromKey, as in the following:

use admin
var nextKey = { };
var result;

while ( nextKey != null ) {
result = db.runCommand( { cleanupOrphaned: “test.user”, startingFromKey: nextKey } );

if (result.ok != 1)
print(“Unable to complete at this time: failure or timeout.”)

printjson(result);

nextKey = result.stoppedAtKey;
}

  • Ask Question