[MongoDB] : Importance of Aggregation Framework

I’m representing file system information (folders and files) inside mongoDB as below.

> use dbversity
switched to db dbversity
>
> db.dbversity_col.insert({

… “_id” : “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident”,
… “Size” : 6312131,
… “DirectoryName” : “dbversity.db.Database.Incident”,
… “Files” :
… [ {
… “FileName” : “dbversity.db.Database.Incident”,
… “Extension” : “db”,
… “Size” : 90108 },

… {
… “FileName” : “dbversity_file1.jar”,
… “Extension” : “jar”,
… “Size” : 196 } ,

… {
… “FileName” : “dbversity_file2.db”,
… “Extension” : “db”,
… “Size” : 196 },

… {
… “FileName” : “dbversity_file3.pdf”,
… “Extension” : “pdf”,
… “Size” : 196 }



… ],
… “SubFolders” :
… [ “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Database References”,
… “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Functions”
… ] })
>
>
> db.dbversity_col.find().pretty()
{
“_id” : “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident”,
“Size” : 6312131,
“DirectoryName” : “dbversity.db.Database.Incident”,
“Files” : [
{
“FileName” : “dbversity.db.Database.Incident”,
“Extension” : “db”,
“Size” : 90108
},
{
“FileName” : “dbversity_file1.jar”,
“Extension” : “jar”,
“Size” : 196
},
{
“FileName” : “dbversity_file2.db”,
“Extension” : “db”,
“Size” : 196
},
{
“FileName” : “dbversity_file3.pdf”,
“Extension” : “pdf”,
“Size” : 196
}
],
“SubFolders” : [
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Database References”,
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Functions”
]
}
>
The problem:

If we want to find files that have db extension we could execute the query:
> db.dbversity_col.find({“Files.Extension”:”db”}).pretty()
{
“_id” : “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident”,
“Size” : 6312131,
“DirectoryName” : “dbversity.db.Database.Incident”,
“Files” : [
{
“FileName” : “dbversity.db.Database.Incident”,
“Extension” : “db”,
“Size” : 90108
},
{
“FileName” : “dbversity_file1.jar”,
“Extension” : “jar”,
“Size” : 196
},
{
“FileName” : “dbversity_file2.db”,
“Extension” : “db”,
“Size” : 196
},
{
“FileName” : “dbversity_file3.pdf”,
“Extension” : “pdf”,
“Size” : 196
}
],
“SubFolders” : [
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Database References”,
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Functions”
]
}
>
If you execute above query you will get all documents (Directories) in which at least one file has extension db along with all the files inside that directory – so files that don’t have db extension.
You have to filter it again on application side. This for large data sets is not feasible.
There is special command $elemMatch that if you use will not return all files down the wire, but it will return maximum one element. So if there are two files that match criteria only one will be returned.

 

db.dbversity_col.find( { Files: { $elemMatch : { Extension : “pdf” } } } ).pretty()

———————-
> db.dbversity_col.insert({
… “_id” : “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident2”,
… “Size” : 6312131,
… “DirectoryName” : “dbversity.db.Database.Incident2”,
… “Files” : [
… {
… “FileName” : “dbversity.db.Database.Incident2”,
… “Extension” : “db”,
… “Size” : 90108
… },
… {
… “FileName” : “dbversity_file11.jar”,
… “Extension” : “jar”,
… “Size” : 196
… },
… {
… “FileName” : “dbversity_file22.db”,
… “Extension” : “db”,
… “Size” : 196
… },
… {
… “FileName” : “dbversity_file33.pdf”,
… “Extension” : “pdf”,
… “Size” : 196
… }
… ],
… “SubFolders” : [
… “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Database References2”,
… “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Functions2”
… ]
… })
>
>
>
> db.dbversity_col.find( { Files: { $elemMatch : { Extension : “pdf” } } } ).pretty()
{
“_id” : “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident”,
“Size” : 6312131,
“DirectoryName” : “dbversity.db.Database.Incident”,
“Files” : [
{
“FileName” : “dbversity.db.Database.Incident”,
“Extension” : “db”,
“Size” : 90108
},
{
“FileName” : “dbversity_file1.jar”,
“Extension” : “jar”,
“Size” : 196
},
{
“FileName” : “dbversity_file2.db”,
“Extension” : “db”,
“Size” : 196
},
{
“FileName” : “dbversity_file3.pdf”,
“Extension” : “pdf”,
“Size” : 196
}
],
“SubFolders” : [
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Database References”,
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Functions”
]
}
{
“_id” : “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident2”,
“Size” : 6312131,
“DirectoryName” : “dbversity.db.Database.Incident2”,
“Files” : [
{
“FileName” : “dbversity.db.Database.Incident2”,
“Extension” : “db”,
“Size” : 90108
},
{
“FileName” : “dbversity_file11.jar”,
“Extension” : “jar”,
“Size” : 196
},
{
“FileName” : “dbversity_file22.db”,
“Extension” : “db”,
“Size” : 196
},
{
“FileName” : “dbversity_file33.pdf”,
“Extension” : “pdf”,
“Size” : 196
}
],
“SubFolders” : [
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Database References2”,
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Functions2”
]
}
>
>
>
>
> db.dbversity_col.find( { Files: { $elemMatch : { Extension : “pdf”, “FileName” : “dbversity_file33.pdf” } } } ).pretty()
{
“_id” : “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident2”,
“Size” : 6312131,
“DirectoryName” : “dbversity.db.Database.Incident2”,
“Files” : [
{
“FileName” : “dbversity.db.Database.Incident2”,
“Extension” : “db”,
“Size” : 90108
},
{
“FileName” : “dbversity_file11.jar”,
“Extension” : “jar”,
“Size” : 196
},
{
“FileName” : “dbversity_file22.db”,
“Extension” : “db”,
“Size” : 196
},
{
“FileName” : “dbversity_file33.pdf”,
“Extension” : “pdf”,
“Size” : 196
}
],
“SubFolders” : [
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Database References2”,
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Functions2”
]
}
>

 

The solution:

Aggregation Framework offers a workaround for this. While not ideal, Aggregation Framework offers operator $unwind which will change the schema (in memory) so that each Files entry is treated as just another property inside the document.

This works in the following way. If one document (directory) that has 15 files the $unwind operator will create 15 documents (each with one file). Once this is done standard filters works as usual.

Therefore our query becomes:

Result when it has only one document.

> db.dbversity_col.aggregate([ { $unwind: “$Files” }, { $match: { “Files.Extension” : “db” } }, { $sort: { “Files.Size” : -1.0 } }, { $limit: 5.0 } ])
{
“result” : [
{
“_id” : “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident”,
“Size” : 6312131,
“DirectoryName” : “dbversity.db.Database.Incident”,
“Files” : {
“FileName” : “dbversity.db.Database.Incident”,
“Extension” : “db”,
“Size” : 90108
},
“SubFolders” : [
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Database References”,
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Functions”
]
},
{
“_id” : “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident”,
“Size” : 6312131,
“DirectoryName” : “dbversity.db.Database.Incident”,
“Files” : {
“FileName” : “dbversity_file2.db”,
“Extension” : “db”,
“Size” : 196
},
“SubFolders” : [
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Database References”,
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Functions”
]
}
],
“ok” : 1
}
>

Result after inserting second document.

> db.dbversity_col.aggregate([ { $unwind: “$Files” }, { $match: { “Files.Extension” : “db” } }, { $sort: { “Files.Size” : -1.0 } }, { $limit: 5.0 } ])
{
“result” : [
{
“_id” : “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident2”,
“Size” : 6312131,
“DirectoryName” : “dbversity.db.Database.Incident2”,
“Files” : {
“FileName” : “dbversity.db.Database.Incident2”,
“Extension” : “db”,
“Size” : 90108
},
“SubFolders” : [
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Database References2”,
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Functions2”
]
},
{
“_id” : “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident”,
“Size” : 6312131,
“DirectoryName” : “dbversity.db.Database.Incident”,
“Files” : {
“FileName” : “dbversity.db.Database.Incident”,
“Extension” : “db”,
“Size” : 90108
},
“SubFolders” : [
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Database References”,
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Functions”
]
},
{
“_id” : “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident”,
“Size” : 6312131,
“DirectoryName” : “dbversity.db.Database.Incident”,
“Files” : {
“FileName” : “dbversity_file2.db”,
“Extension” : “db”,
“Size” : 196
},
“SubFolders” : [
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Database References”,
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Functions”
]
},
{
“_id” : “\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident2”,
“Size” : 6312131,
“DirectoryName” : “dbversity.db.Database.Incident2”,
“Files” : {
“FileName” : “dbversity_file22.db”,
“Extension” : “db”,
“Size” : 196
},
“SubFolders” : [
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Database References2”,
“\\\\DBVERSITW06DV\\D$\\DTS\\096-dbversity.db.Database\\dbversity.db.Database.Incident\\Functions2”
]
}
],
“ok” : 1
}
>
>
Any further suggestions/solutions, please.

 

  • Ask Question