Document Versioning in Couchbase
- Handling concurrent access
- Relevant attributes
- One document per version
- Embedded revision tree
- Combined approaches
Handling concurrent access
- C(ompare) A(nd) S(wap): This is the optimistic approach. Each document has a built-in property which is the CAS-value. The CAS-value changes as soon as somebody updates the document. So the idea is to implement something like the following on the application side:
- Get the document and especially the CAS-value!
- Modify some document properties!
- Perform an update operation by passing the old CAS-value (from step 1.)!
- If somebody else updated the document meanwhile then a CAS mismatch error occurs because your client side CAS value is no longer identical to the server side one. If so then wait for a very short moment and then try again from step 1.!
- Multiple users/threads are accessing the same document. You will reach step 5. because you have the same chance as all others and so you will have the chance to update the document before someone else is doing it.
- Locking: You can lock a document before you perform the changes and then release the lock. This is the pessimistic approach. A lock wait implementation is require in this case:
- Get the document and request a lock.
- If the document is locked then a lock error occurs!
- Wait until the document is released again and try from 1. !
- Update the document!
- Release the the document!
Relevant Attributes
- Revision number: Couchbase has the built-in attribute ‘rev’ which is accessible via the document’s meta data (meta.rev). The revision number is increased for every update and is internally used for the conflict resolution is you use Couchbase’s Cross Data Center Replication feature. A higher revision number means that a document was more often updated.
- CAS value: This attribute was already explained in the previous chapter. It is used to determine if a document was changed since you accessed it the last time.
- Update time stamp: A version could contain the update time stamp in order to determine who updated it last. You have to be careful here because your clients may not be time synchronized.
- Custom revision number: Even if there is a built-in one, you can also introduce just a incrementing number as your revision number.
- Updater: Person/service who/which updated the document.
- Revision identifier: Another option would be to use an artificial id as the version number. So something like a UUID would be suitable.
- Parent version: The previous revision.
“cnti::1abc-2def-3ghi-4jkl::7” : { | |
“author” : “David Maier”, | |
“content” : “/mnt/blobs/1abc-2def-3ghi-4jkl”, | |
“mime-type” : “application/msword”, | |
“tags” : “demo example blog”, | |
“version” : { | |
“rev” : 7, | |
“updated” : 1443426252, | |
“user” : “dmaier”, | |
} | |
} |
One Document per Version
- cnti::1abc-2def-3ghi-4jkl
- count::rev::cnti = 0
- cnti::1abc-2def-3ghi-4jkl::7
- Increment the counter by generating a new revision id
- Get the old document which has the revision ‘rev-1’
- Create a new document with the new revision id
Embedded Revision Tree
A more complex approach would be to embed the versions to the main document as a tree of changes. The disadvantage could be that you document size becomes quite big. So you should limit the number of revisions to embed. Couchbase’s Sync Gateway (a synchronization endpoint for Couchbase Lite instances, whereby Couchbase Lite is a light weighted Couchbase which can run on your mobile device – Rev Tree Storage on Couchbase Server) uses this approach.
The tree definition is quite simple. A tree has nodes. Each node, except the root node, has exactly one parent node. Each node in such a tree is representing one document revision. The tree describes which revision was derived from which other revision. The sub-tree from a specific node in the tree down to the leafs is called a branch.
The picture above shows 6 revisions. Now your application has a lot of possibilities to use such a revision tree.
- From which revision to fork?
- Which revisions/branches to keep?
- How to merge based on the revisions?
- What should be the max. size of the revision tree?
The idea is to have a head reference in the document which points to the current base revision.
“cnti::1abc-2def-3ghi-4jkl” : { | |
“head”: “2-bcd”, | |
“revs” : { | |
“1-abc” : { | |
“meta” : { | |
“updated” : 1443426252, | |
“user” : “dmaier”, | |
}, | |
“doc” : { | |
“author” : “David Maier”, | |
“content” : “/mnt/blobs/1abc-2def-3ghi-4jkl”, | |
“mime-type” : “application/msword”, | |
“tags” : “demo example blog”, | |
} | |
}, | |
“2-bcd” : { | |
“meta” : { | |
“parent” : “1-abc”, | |
“updated” : 1443426255, | |
“user” : “mmustermann”, | |
}, | |
“doc” : { | |
“author” : “David Maier”, | |
“content” : “/mnt/blobs/1abc-2def-3ghi-4jkl”, | |
“mime-type” : “application/msword”, | |
“tags” : “demo example blog couchbase”, | |
} | |
}, | |
… | |
} | |
} |
Combined Approaches
- Change History: Some compliance or security rules are enforcing that you have to be able to answer the question who changed what and when. For this approach the ‘One Document per Version’ approach would be sufficient.
- Conflict Handling: Multiple users are creating several versions and you want to decide to be able to pick a winner or even merge several versions. For this the ‘Revision Tree’ approach would work best.
- If the revision tree becomes to big then
- Archive the current state of the revision tree by creating a new document for this version! An extra ‘archive’ bucket can be used for this purpose.
- Truncate the tree by setting a new head revision!