<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Gernot Pansy &#187; MongoDB</title>
	<atom:link href="http://gernot.pansy.at/tag/mongodb/feed/" rel="self" type="application/rss+xml" />
	<link>http://gernot.pansy.at</link>
	<description></description>
	<lastBuildDate>Sun, 16 Sep 2012 20:57:08 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.4.2</generator>
		<item>
		<title>MongoDB 2.2 &#8211; MapReduce vs. Aggregation Framework Test</title>
		<link>http://gernot.pansy.at/mongodb-2-2-mapreduce-vs-aggregation-framework-test/</link>
		<comments>http://gernot.pansy.at/mongodb-2-2-mapreduce-vs-aggregation-framework-test/#comments</comments>
		<pubDate>Tue, 11 Sep 2012 23:37:22 +0000</pubDate>
		<dc:creator>admin</dc:creator>
				<category><![CDATA[Database]]></category>
		<category><![CDATA[MongoDB]]></category>

		<guid isPermaLink="false">http://gernot.pansy.at/?p=6</guid>
		<description><![CDATA[Intro At Up to Eleven we are currently using MongoDB 2.0 in a sharded setup. Where each shard has 3 replica sets. The biggest collection stores messages to a specific address with a read flag, status and a message date. For the &#8230; <a href="http://gernot.pansy.at/mongodb-2-2-mapreduce-vs-aggregation-framework-test/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<h1>Intro</h1>
<p>At <a href="http://ut11.net/" target="_blank" data-blogger-escaped-target="_blank">Up to Eleven</a> we are currently using MongoDB 2.0 in a sharded setup. Where each shard has 3 replica sets.</p>
<p>The biggest collection stores messages to a specific address with a read flag, status and a message date.</p>
<p>For the web we need an overview of this messages per address (conversation). Therefore we currently use a mapreduce command,  because in a sharded setup the group function is not available.</p>
<p>With release of MongoDB 2.2 their are now 2 new options for this problem &#8211; It&#8217;s now possible to run MapReduce commands with <a href="http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-jsModeflag" target="_blank" data-blogger-escaped-target="_blank">JsMode</a> on shards and their also is the brand new <a href="http://docs.mongodb.org/manual/applications/aggregation/" target="_blank" data-blogger-escaped-target="_blank">aggregation framework</a>.</p>
<p>In the following article the options are compared to each other.</p>
<h1>Schema</h1>
<h2>Document</h2>
<div>
<p>A document will look like this.</p>
</div>
<div>
<pre>{
  "_id" : NumberLong("1073741825094"), // combination of message id &amp; user id
  "u" : 250, // user id
  "a" : "test", // address
  "b" : "Hello, what are you doing right now?", // body
  "i" : true, // incoming flag
  "r" : true, // read flag
  "s" : 0, // status
  "d" : ISODate("2011-06-06T18:49:55Z") // date
}</pre>
<h2>Indexes</h2>
</div>
<div>
<p>We have only one additional index for a user conversation by date.</p>
</div>
<pre>{
  "v" : 1,
  "key" : {
    "_id" : 1
  },
  "ns" : "test.user_messages",
  "name" : "_id_"
},
{
  "v" : 1,
  "key" : {
    "u" : 1,
    "a" : 1,
    "d" : -1
  },
  "ns" : "test.user_messages",
  "name" : "conversation"
}</pre>
<h1>Tests</h1>
<div>
<p>For the test we used a self written benchmark junit test which uses the mongodb java driver 2.9.0. The collection is filled with 1 million records (1000 users with 1000 messages, randomly created &#8211; so out of order).</p>
<p>The measured test runs the following commands 1000 times.</p>
</div>
<h2>MapReduce</h2>
<div>
<p>This is the map reduce command we currently use (because jsmode is not working in 2.0 with shards) .</p>
</div>
<pre>db.runCommand({
  "mapreduce": "user_messages",
  "map": "function(){
     emit(this.a, {
       count:1,
       unread:this.r ? 0 : 1,
       unsent:this.s==5 ? 1 : 0,
       messageId:this._id,
       date:this.d,
       body:this.b
    });
  }",
  "reduce": "function(address,values) {
    var result = { 
      count:0,
      unread:0,
      unsent:0,
      messageId:0,
      date:0,
      body:''
    };
    values.forEach(function(value) {
      result.count += 1;
      if (value.unread &gt; 0) result.unread += 1;
      if (value.unsent &gt; 0) result.unsent += 1;
      if (value.messageId &gt; result.messageId) result.messageId = value.messageId;
      if (value.dateSent &gt;= result.dateSent) {
        result.message = value.message;
        result.dateSent = value.dateSent;
      }
    });
    return result;
  }",
  "verbose" : true,
  "out" : { "inline" : 1 },
  "query" : { "u" : 1 },
}</pre>
<p>time: <strong>38 s 718 ms</strong></p>
<h2>MapReduce with JsMode</h2>
<div>
<p>This the map reduce with js mode, which now works with MongoDB 2.2.</p>
</div>
<pre>db.runCommand({
  "mapreduce": "user_messages",
  "map": "function(){
     emit(this.a, {
       count:1,
       unread:this.r ? 0 : 1,
       unsent:this.s==5 ? 1 : 0,
       messageId:this._id,
       date:this.d,
       body:this.b
    });
  }",
  "reduce": "function(address,values) {
    var result = { 
      count:0,
      unread:0,
      unsent:0,
      messageId:0,
      date:0,
      body:''
    };
    values.forEach(function(value) {
      result.count += 1;
      if (value.unread &gt; 0) result.unread += 1;
      if (value.unsent &gt; 0) result.unsent += 1;
      if (value.messageId &gt; result.messageId) result.messageId = value.messageId;
      if (value.dateSent &gt;= result.dateSent) {
        result.message = value.message;
        result.dateSent = value.dateSent;
      }
    });
    return result;
  }",
  "verbose" : true,
  "out" : { "inline" : 1 },
  "query" : { "u" : 1 },
  "jsMode" : true
}</pre>
<p>time: <strong>22 s 237 ms</strong></p>
<h2>Aggregation Framework</h2>
<div>
<p>This test is with the new aggregation framework introduced in MongoDB 2.2.</p>
</div>
<pre>db.user_messages.aggregate(
 { $match: { u:1 } },
 { $sort: { a:1, d:-1 } },
 { $group: {
     _id: "$a",
     count: { $sum : 1 },
     unread: { $sum : { $cond : [ "$r", 0 , 1]} },
     unsent: { $sum : { $cond : [ { $eq : [ "$s", 5 ] }, 1, 0 ] } },
     messageId: { $max : "$_id" }, 
     date: { $max : "$d" },
     body: { $first : "$b" },
 } }
);</pre>
<p>time: <strong>6 s 835 ms</strong></p>
<h1>Conclusion</h1>
<div>
<p>It seems 10gen did a great job in MongoDB 2.2. The new aggreation framework seems to be a lot faster (it&#8217;s more than <strong>5</strong> times faster than normal mapreduce and more than <strong>3</strong> times faster than mapreduce with jsmode).</p>
<p>Also version 2.2 supports compound indexes for shards, so we can drop one index per collection and save memory.</p>
</div>
<div>
<p>Can&#8217;t wait to update our cluster with 2.2, but there fore we have to wait for 2.2.1 release. Which addresses a issue i have discovered on upgrading our test setup.</p>
<h1>Future</h1>
</div>
<div>
<p>Hopefully it will be possible to run mapreduce or aggregation commands on secondarys for a sharded  setup.</p>
</div>
]]></content:encoded>
			<wfw:commentRss>http://gernot.pansy.at/mongodb-2-2-mapreduce-vs-aggregation-framework-test/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
