MongoEngine - Aggregation



The term ‘aggregation’ is used for the operation that processes data and returns computed result. Finding sum, count and average on one or more fields of documents in a collection can be called as aggregation functions.

MongoEngine provides aggregate() function that encapsulates PyMongo’s aggregation framework. Aggregation operation uses a collection as input and returns one or more documents as a result.

MongoDB uses concept of data processing pipelines. A pipeline can have multiple stages. Basic stage provides that provide filter and operate like queries. Others provide tools for grouping and/or sorting by one or more fields, string concatenation tasks, array aggregation tools, etc.

Following stages are defined in MongoDB pipeline creation −

Name Description
$project Reshapes each document in the stream, by adding new fields or removing existing fields.
$match Filters the document stream to allow only matching documents to pass unmodified into the next stage. $match uses standard MongoDB queries.
$redact Reshapes each document by restricting the content for each document based on information stored in the documents themselves.
$limit Limits documents to be passed unmodified to the pipeline
$skip Skips the first n documents and passes the remaining documents unmodified to the pipeline.
$group Groups input documents by a given identifier expression and applies the accumulator expressions to each group. The output documents only contain the identifier field and accumulated fields.
$sort Reorders the document stream by a specified sort key.
$out Writes the resulting documents of the aggregation pipeline to a collection.

Aggregation expressions use field path to access fields in the input documents. To specify a field path, use a string that prefixes with a dollar sign $$$ the field name. Expression can use one or more Boolean operators ($and, $or, $not) and comparison operators ($eq, $gt, $lt, $gte, $lte and $ne).

Following arithmetic expressions are also used for aggregation −

$add Adds numbers to return the sum. Accepts any number of argument expressions
$subtract Returns the result of subtracting the second value from the first
$multiply Multiplies numbers to return the product. Accepts any number of argument expressions
$divide Returns the result of dividing the first number by the second. Accepts two argument expressions
$mod Returns the remainder of the first number divided by the second. Accepts two argument expressions

Following string expression can also be used in aggregation −

$concat Concatenates any number of strings
$substr Returns a substring of a string, starting at a specified index position up to a specified length
$toLower Converts a string to lowercase. Accepts a single argument expression
$toUpper Converts a string to uppercase. Accepts a single argument expression
$strcasecmp Performs string comparison and returns 0 if two strings are equivalent, 1 if first is greater than second, and -1 if first string is less than second

To demonstrate how aggregate() function works in MongoEngine, let us first define a Document class called orders.

from mongoengine import *
con=connect('mydata')

class orders(Document):
   custID = StringField()
   amount= IntField()
   status = StringField()

We then add following documents in orders collection −

_id custID amount status
ObjectId("5eba52d975fa1e26d4ec01d0") A123 500 A
ObjectId("5eba536775fa1e26d4ec01d1") A123 250 A
ObjectId("5eba53b575fa1e26d4ec01d2") B212 200 D
ObjectId("5eba540e75fa1e26d4ec01d3") B212 400 A

The aggregate() function is to be used to find sum of amount field for each custID only when status equals ‘A’. Accordingly, the pipeline is constructed as follows.

First stage in pipeline uses $match to filter documents with status=’A’. Second stage uses $group identifier to group documents on CustID and performs sum of amount.

 pipeline = [
{"$match" : {"status" : "A"}},
{"$group": {"_id": "$custID", "total": {"$sum": "$amount"}}}
]

This pipeline is now used as argument to aggregate() function.

docs = orders.objects().aggregate(pipeline)

We can iterate over the document cursor with a for loop. The complete code is given below −

from mongoengine import *
con=connect('mydata')

class orders(Document):
   custID = StringField()
   amount= IntField()
   status = StringField()

pipeline = [
   {"$match" : {"status" : "A"}},
   {"$group": {"_id": "$custID", "total": {"$sum": "$amount"}}}
   ]
docs = orders.objects().aggregate(pipeline)
for doc in docs:
   print (x)

For the given data, the following output is generated −

{'_id': 'B212', 'total': 400}
{'_id': 'A123', 'total': 750}
Advertisements