MongoDB - Map Reduce


Advertisements


As per the MongoDB documentation, Map-reduce is a data processing paradigm for condensing large volumes of data into useful aggregated results. MongoDB uses mapReduce command for map-reduce operations. MapReduce is generally used for processing large data sets.

MapReduce Command

Following is the syntax of the basic mapReduce command −

>db.collection.mapReduce(
   function() {emit(key,value);},  //map function
   function(key,values) {return reduceFunction}, {   //reduce function
      out: collection,
      query: document,
      sort: document,
      limit: number
   }
)

The map-reduce function first queries the collection, then maps the result documents to emit key-value pairs which is then reduced based on the keys that have multiple values.

In the above syntax

  • map is a javascript function that maps a value with a key and emits a key-valur pair

  • reduce is a javscript function that reduces or groups all the documents having the same key

  • out specifies the location of the map-reduce query result

  • query specifies the optional selection criteria for selecting documents

  • sort Sorts the input documents. This option is useful for optimization. For example, specify the sort key to be the same as the emit key so that there are fewer reduce operations. The sort key must be in an existing index for this collection.

  • limit Specifies a maximum number of documents for the input into the mapfunction.

Using MapReduce

Consider the following document structure storing user posts. The document stores user_name of the user and the status of post.

{
   "post_text": "tutorialspoint is an awesome website for tutorials",
   "user_name": "mark",
   "status":"active"
}

Now, we will use a mapReduce function on our posts collection to select all the active posts, group them on the basis of user_name and then count the number of posts by each user using the following code −

>db.posts.mapReduce( 
   function() { emit(this.user_id,1); }, 
	
   function(key, values) {return Array.sum(values)}, {  
      query:{status:"active"},  
      out:"post_total" 
   }
)

The above mapReduce query outputs the following result −

{
   "result" : "post_total",
   "timeMillis" : 9,
   "counts" : {
      "input" : 4,
      "emit" : 4,
      "reduce" : 2,
      "output" : 2
   },
   "ok" : 1,
}

The result shows that a total of 4 documents matched the query (status:"active"), the map function emitted 4 documents with key-value pairs and finally the reduce function grouped mapped documents having the same keys into 2.

To see the result of this mapReduce query use the find operator −

>db.posts.mapReduce( 
   function() { emit(this.user_id,1); }, 
   function(key, values) {return Array.sum(values)}, {  
      query:{status:"active"},  
      out:"post_total" 
   }
	
).find()

The above query gives the following result which indicates that both users tom and mark have two posts in active states −

{ "_id" : "tom", "value" : 2 }
{ "_id" : "mark", "value" : 2 }

In similar manner, MapReduce queries can be used to construct large complex aggregation queries. The use of custom Javascript functions makes usage of MapReduce very flexible and powerful.



Advertisements