Full text search in MongoDB

This is a custom implementation created by the MongoDB developers as a specific index type, and is due to be launched as an experimental feature in MongoDB 2.4. It has features such as:
Full text search as an index type when creating new indexes, just like any other.
Indexing of multiple fields, with weighting to give different fields higher priority.
Support for Latin based languages initially, with plans for other character sets later. Initially this will be: Danish, Dutch, English, Finnish, French, German, Hungarian, Italian, Norwegian, Portuguese, Romanian, Russian, Spanish, Swedish and Turkish.
Support for advanced queries, similar to the Google search syntax e.g. negation and phrase matching.
Stemming, to deal with plurals.
Stop words (see the list here).
This looks like a good, general purpose full text search engine which goes along well with how MongoDB is developing into a good multi-purpose database.
Examples
First we enable full text search in the the latest unstable nightly and insert some test documents:
use test
 
db.adminCommand( { setParameter : "*", textSearchEnabled : true } );
 
tc = db.test
 
tc.save( { _id: 1, title: "Olivia Shakespear",text: "Olivia Shakespear (born Olivia Tucker; 17 March 1863 – 3 October 1938) was a British novelist, playwright, and patron of the arts. She wrote six books that are described as \"marriage problem\" novels. Her works sold poorly, sometimes only a few hundred copies. Her last novel, Uncle Hilary, is considered her best. She wrote two plays in collaboration with Florence Farr." } );
 
tc.save( { _id: 2, title: "Linn-Kristin Riegelhuth Koren", text: "Linn-Kristin Riegelhuth Koren (born 1 August 1984, in Ski) is a Norwegian handballer playing for Larvik HK and the Norwegian national team. She is commonly known as Linka. Outside handball she is a qualified nurse." } );
 
Then we can create a new index on the title field:
tc.ensureIndex( { "title": "text" } );
and we can now search:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
> res = tc.runCommand( "text", { search: "Olivia" } );
{
 "queryDebugString" : "olivia||||||",
 "language" : "english",
 "results" : [
 {
 "score" : 0.75,
 "obj" : {
 "_id" : 1,
 "title" : "Olivia Shakespear",
 "text" : "Olivia Shakespear (born Olivia Tucker; 17 March 1863 – 3 October 1938) was a British novelist, playwright, and patron of the arts. She wrote six books that are described as \"marriage problem\" novels. Her works sold poorly, sometimes only a few hundred copies. Her last novel, Uncle Hilary, is considered her best. She wrote two plays in collaboration with Florence Farr."
 }
 }
 ],
 "stats" : {
 "nscanned" : 1,
 "nscannedObjects" : 0,
 "n" : 1,
 "timeMicros" : 128
 },
 "ok" : 1
}
We can then add the text field to the index. Note that you can only have 1 full text index so I have to drop the original one first, then recreate it as a compound index:
tc.dropIndexes()
tc.ensureIndex( { "title": "text", "text": "text" } );
and test stemming:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
> res = tc.runCommand( "text", { search: "novelists" } );
{
 "queryDebugString" : "novelist||||||",
 "language" : "english",
 "results" : [
 {
 "score" : 0.5116279069767442,
 "obj" : {
 "_id" : 1,
 "title" : "Olivia Shakespear",
 "text" : "Olivia Shakespear (born Olivia Tucker; 17 March 1863 – 3 October 1938) was a British novelist, playwright, and patron of the arts. She wrote six books that are described as \"marriage problem\" novels. Her works sold poorly, sometimes only a few hundred copies. Her last novel, Uncle Hilary, is considered her best. She wrote two plays in collaboration with Florence Farr."
 }
 }
 ],
 "stats" : {
 "nscanned" : 1,
 "nscannedObjects" : 0,
 "n" : 1,
 "timeMicros" : 90
 },
 "ok" : 1
}
 
 
We can see the index we created and you can set overrides on the language:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
> tc.getIndexes()
[
 {
 "v" : 1,
 "key" : {
 "_id" : 1
 },
 "ns" : "test.test",
 "name" : "_id_"
 },
 {
 "v" : 0,
 "key" : {
 "_fts" : "text",
 "_ftsx" : 1
 },
 "ns" : "test.test",
 "name" : "title_text_text_text",
 "weights" : {
 "text" : 1,
 "title" : 1
 },
 "default_language" : "english",
 "language_override" : "language"
 }
]
You can specify the weight and default_language options when creating the index e.g.
tc.ensureIndex( { "title": "text", "text": "text" }, {weights: { title: 10 }, default_language: "norwegian" } );
And that’s basically it (from what I can see from the tests). Nice and simple.

  • Ask Question