Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 4, 2012

Using MongoDB’s New Aggregation Framework in Python (MongoDB Aggregation Part 2)

Filed under: Aggregation,MongoDB,NoSQL,Python — Patrick Durusau @ 4:32 pm

Using MongoDB’s New Aggregation Framework in Python (MongoDB Aggregation Part 2) by Rick Copeland.

From the post:

Continuing on in my series on MongoDB and Python, this article will explore the new aggregation framework introduced in MongoDB 2.1. If you’re just getting started with MongoDB, you might want to read the previous articles in the series first:

And now that you’re all caught up, let’s jump right in….

Why a new framework?

If you’ve been following along with this article series, you’ve been introduced to MongoDB’s mapreduce command, which up until MongoDB 2.1 has been the go-to aggregation tool for MongoDB. (There’s also the group() command, but it’s really no more than a less-capable and un-shardable version of mapreduce(), so we’ll ignore it here.) So if you already have mapreduce() in your toolbox, why would you ever want something else?

Mapreduce is hard; let’s go shopping

The first motivation behind the new framework is that, while mapreduce() is a flexible and powerful abstraction for aggregation, it’s really overkill in many situations, as it requires you to re-frame your problem into a form that’s amenable to calculation using mapreduce(). For instance, when I want to calculate the mean value of a property in a series of documents, trying to break that down into appropriate map, reduce, and finalize steps imposes some extra cognitive overhead that we’d like to avoid. So the new aggregation framework is (IMO) simpler.

Other than the obvious utility of the new aggregation framework in MongoDB, there is another reason to mention this post: You should use only as much aggregation or in topic map terminology, “merging,” as you need.

It isn’t possible to create a system that will correctly aggregate/merge all possible content. Take that as a given.

In part because new semantics are emerging every day and there are too many previous semantics that are poorly documented or unknown.

What we can do is establish requirements for particular semantics for given tasks and document those to facilitate their possible re-use in the future.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress