MongoDB Aggregation

Advertisements


Aggregations operations process data records and return computed results. Aggregation operations group values from multiple documents together, and can perform a variety of operations on the grouped data to return a single result. In sql count(*) and with group by is an equivalent of mongodb aggregation.

The aggregate() Method

For the aggregation in mongodb you should use aggregate() method.

Syntax:

Basic syntax of aggregate() method is as follows

>db.COLLECTION_NAME.aggregate(AGGREGATE_OPERATION)

Example:

In the collection you have the following data:

{
   _id: ObjectId(7df78ad8902c)
   title: 'MongoDB Overview', 
   description: 'MongoDB is no sql database',
   by_user: 'tutorials point',
   url: 'http://www.tutorialspoint.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 100
},
{
   _id: ObjectId(7df78ad8902d)
   title: 'NoSQL Overview', 
   description: 'No sql database is very fast',
   by_user: 'tutorials point',
   url: 'http://www.tutorialspoint.com',
   tags: ['mongodb', 'database', 'NoSQL'],
   likes: 10
},
{
   _id: ObjectId(7df78ad8902e)
   title: 'Neo4j Overview', 
   description: 'Neo4j is no sql database',
   by_user: 'Neo4j',
   url: 'http://www.neo4j.com',
   tags: ['neo4j', 'database', 'NoSQL'],
   likes: 750
},

Now from the above collection if you want to display a list that how many tutorials are written by each user then you will use aggregate() method as shown below:

> db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : 1}}}])
{
   "result" : [
      {
         "_id" : "tutorials point",
         "num_tutorial" : 2
      },
      {
         "_id" : "tutorials point",
         "num_tutorial" : 1
      }
   ],
   "ok" : 1
}
>

Sql equivalent query for the above use case will be select by_user, count(*) from mycol group by by_user

In the above example we have grouped documents by field by_user and on each occurance of by_user previous value of sum is incremented. There is a list available aggregation expressions.

ExpressionDescriptionExample
$sumSums up the defined value from all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$sum : "$likes"}}}])
$avgCalculates the average of all given values from all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$avg : "$likes"}}}])
$minGets the minimum of the corresponding values from all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$min : "$likes"}}}])
$maxGets the maximum of the corresponding values from all documents in the collection.db.mycol.aggregate([{$group : {_id : "$by_user", num_tutorial : {$max : "$likes"}}}])
$pushInserts the value to an array in the resulting document.db.mycol.aggregate([{$group : {_id : "$by_user", url : {$push: "$url"}}}])
$addToSetInserts the value to an array in the resulting document but does not create duplicates.db.mycol.aggregate([{$group : {_id : "$by_user", url : {$addToSet : "$url"}}}])
$firstGets the first document from the source documents according to the grouping. Typically this makes only sense together with some previously applied “$sort”-stage.db.mycol.aggregate([{$group : {_id : "$by_user", first_url : {$first : "$url"}}}])
$lastGets the last document from the source documents according to the grouping. Typically this makes only sense together with some previously applied “$sort”-stage.db.mycol.aggregate([{$group : {_id : "$by_user", last_url : {$last : "$url"}}}])

Pipeline Concept

In UNIX command shell pipeline means the possibility to execute an operation on some input and use the output as the input for the next command and so on. MongoDB also support same concept in aggregation framework. There is a set of possible stages and each of those is taken a set of documents as an input and is producing a resulting set of documents (or the final resulting JSON document at the end of the pipeline). This can then in turn again be used for the next stage an so on.

Possible stages in aggregation framework are following:

  • $project: Used to select some specific fields from a collection.
  • $match: This is a filtering operation and thus this can reduce the amount of documents that are given as input to the next stage.
  • $group: This does the actual aggregation as discussed above.
  • $sort: Sorts the documents.
  • $skip: With this it is possible to skip forward in the list of documents for a given amount of documents.
  • $limit: This limits the amount of documents to look at by the given number starting from the current position.s
  • $unwind: This is used to unwind document that are using arrays. when using an array the data is kind of pre-joinded and this operation will be undone with this to have individual documents again. Thus with this stage we will increase the amount of documents for the next stage.


Advertisements
Advertisements