MongoDB 2.2 – MapReduce vs. Aggregation Framework Test

Intro

At Up to Eleven we are currently using MongoDB 2.0 in a sharded setup. Where each shard has 3 replica sets.

The biggest collection stores messages to a specific address with a read flag, status and a message date.

For the web we need an overview of this messages per address (conversation). Therefore we currently use a mapreduce command,  because in a sharded setup the group function is not available.

With release of MongoDB 2.2 their are now 2 new options for this problem – It’s now possible to run MapReduce commands with JsMode on shards and their also is the brand new aggregation framework.

In the following article the options are compared to each other.

Schema

Document

A document will look like this.

{
  "_id" : NumberLong("1073741825094"), // combination of message id & user id
  "u" : 250, // user id
  "a" : "test", // address
  "b" : "Hello, what are you doing right now?", // body
  "i" : true, // incoming flag
  "r" : true, // read flag
  "s" : 0, // status
  "d" : ISODate("2011-06-06T18:49:55Z") // date
}

Indexes

We have only one additional index for a user conversation by date.

{
  "v" : 1,
  "key" : {
    "_id" : 1
  },
  "ns" : "test.user_messages",
  "name" : "_id_"
},
{
  "v" : 1,
  "key" : {
    "u" : 1,
    "a" : 1,
    "d" : -1
  },
  "ns" : "test.user_messages",
  "name" : "conversation"
}

Tests

For the test we used a self written benchmark junit test which uses the mongodb java driver 2.9.0. The collection is filled with 1 million records (1000 users with 1000 messages, randomly created – so out of order).

The measured test runs the following commands 1000 times.

MapReduce

This is the map reduce command we currently use (because jsmode is not working in 2.0 with shards) .

db.runCommand({
  "mapreduce": "user_messages",
  "map": "function(){
     emit(this.a, {
       count:1,
       unread:this.r ? 0 : 1,
       unsent:this.s==5 ? 1 : 0,
       messageId:this._id,
       date:this.d,
       body:this.b
    });
  }",
  "reduce": "function(address,values) {
    var result = { 
      count:0,
      unread:0,
      unsent:0,
      messageId:0,
      date:0,
      body:''
    };
    values.forEach(function(value) {
      result.count += 1;
      if (value.unread > 0) result.unread += 1;
      if (value.unsent > 0) result.unsent += 1;
      if (value.messageId > result.messageId) result.messageId = value.messageId;
      if (value.dateSent >= result.dateSent) {
        result.message = value.message;
        result.dateSent = value.dateSent;
      }
    });
    return result;
  }",
  "verbose" : true,
  "out" : { "inline" : 1 },
  "query" : { "u" : 1 },
}

time: 38 s 718 ms

MapReduce with JsMode

This the map reduce with js mode, which now works with MongoDB 2.2.

db.runCommand({
  "mapreduce": "user_messages",
  "map": "function(){
     emit(this.a, {
       count:1,
       unread:this.r ? 0 : 1,
       unsent:this.s==5 ? 1 : 0,
       messageId:this._id,
       date:this.d,
       body:this.b
    });
  }",
  "reduce": "function(address,values) {
    var result = { 
      count:0,
      unread:0,
      unsent:0,
      messageId:0,
      date:0,
      body:''
    };
    values.forEach(function(value) {
      result.count += 1;
      if (value.unread > 0) result.unread += 1;
      if (value.unsent > 0) result.unsent += 1;
      if (value.messageId > result.messageId) result.messageId = value.messageId;
      if (value.dateSent >= result.dateSent) {
        result.message = value.message;
        result.dateSent = value.dateSent;
      }
    });
    return result;
  }",
  "verbose" : true,
  "out" : { "inline" : 1 },
  "query" : { "u" : 1 },
  "jsMode" : true
}

time: 22 s 237 ms

Aggregation Framework

This test is with the new aggregation framework introduced in MongoDB 2.2.

db.user_messages.aggregate(
 { $match: { u:1 } },
 { $sort: { a:1, d:-1 } },
 { $group: {
     _id: "$a",
     count: { $sum : 1 },
     unread: { $sum : { $cond : [ "$r", 0 , 1]} },
     unsent: { $sum : { $cond : [ { $eq : [ "$s", 5 ] }, 1, 0 ] } },
     messageId: { $max : "$_id" }, 
     date: { $max : "$d" },
     body: { $first : "$b" },
 } }
);

time: 6 s 835 ms

Conclusion

It seems 10gen did a great job in MongoDB 2.2. The new aggreation framework seems to be a lot faster (it’s more than 5 times faster than normal mapreduce and more than 3 times faster than mapreduce with jsmode).

Also version 2.2 supports compound indexes for shards, so we can drop one index per collection and save memory.

Can’t wait to update our cluster with 2.2, but there fore we have to wait for 2.2.1 release. Which addresses a issue i have discovered on upgrading our test setup.

Future

Hopefully it will be possible to run mapreduce or aggregation commands on secondarys for a sharded  setup.