Archive for the ‘MongoDB’ Category

MongoDB: The Definitive Guide 2nd Edition is Out!

Thursday, May 23rd, 2013

MongoDB: The Definitive Guide 2nd Edition is Out! by Kristina Chodorow.

From the webpage:

The second edition of MongoDB: The Definitive Guide is now available from O’Reilly! It covers both developing with and administering MongoDB. The book is language-agnostic: almost all of the examples are in JavaScript.

Looking forward to enjoying the second edition as much as the first!

Although, I am not really sure that always using JavaScript means you are “language-agnostic.” ;-)

MongoDB as in-memory DB

Thursday, May 9th, 2013

How to use MongoDB as a pure in-memory DB (Redis style) by Antoine Girbal.

From the post:

There has been a growing interest in using MongoDB as an in-memory database, meaning that the data is not stored on disk at all. This can be super useful for applications like:

  • a write-heavy cache in front of a slower RDBMS system
  • embedded systems
  • PCI compliant systems where no data should be persisted
  • unit testing where the database should be light and easily cleaned

That would be really neat indeed if it was possible: one could leverage the advanced querying / indexing capabilities of MongoDB without hitting the disk. As you probably know the disk IO (especially random) is the system bottleneck in 99% of cases, and if you are writing data you cannot avoid hitting the disk.

One sweet design choice of MongoDB is that it uses memory-mapped files to handle access to data files on disk. This means that MongoDB does not know the difference between RAM and disk, it just accesses bytes at offsets in giant arrays representing files and the OS takes care of the rest! It is this design decision that allows MongoDB to run in RAM with no modification.

Reports getting 20K writes per second on a single core.

I can imagine topic map scenarios where no data should be persisted.

You?

How to Compare NoSQL Databases

Friday, April 19th, 2013

How to Compare NoSQL Databases by Ben Engber. (video)

From the description:

Ben Engber, CEO and founder of Thumbtack Technology, will discuss how to perform tuned benchmarking across a number of NoSQL solutions (Couchbase, Aerospike, MongoDB, Cassandra, HBase, others) and to do so in a way that does not artificially distort the data in favor of a particular database or storage paradigm. This includes hardware and software configurations, as well as ways of measuring to ensure repeatable results.

We also discuss how to extend benchmarking tests to simulate different kinds of failure scenarios to help evaluate the maintainablility and recoverability of different systems. This requires carefully constructed tests and significant knowledge of the underlying databases — the talk will help evaluators overcome the common pitfalls and time sinks involved in trying to measure this.

Lastly we discuss the YCSB benchmarking tool, its significant limitations, and the significant extensions and supplementary tools Thumbtack has created to provide distributed load generation and failure simulation.

Ben makes a very good case for understanding the details of your use case versus the characteristics of particular NoSQL solutions.

Where you will find “better” performance depends on non-obvious details.

Watch the use of terms like “consistency” in this presentation.

The paper Ben refers to: Ultra-High Performance NoSQL Benchmarking: Analyzing Durability and Performance Tradeoffs.

Forty-three pages of analysis and charts.

Slow but interesting reading.

If you are into the details of performance and NoSQL databases.

Mongraph

Saturday, April 13th, 2013

Mongraph

From the readme:

Mongraph combines documentstorage database with graph-database relationships by creating a corresponding node for each document.

Flies in the face of every app being the “universal” app orthodoxy but still worth watching.

Wanted: Evaluators to Try MongoDB with Fractal Tree Indexing

Tuesday, March 26th, 2013

Wanted: Evaluators to Try MongoDB with Fractal Tree Indexing by Tim Callaghan.

From the post:

We recently resumed our discussion around bringing Fractal Tree indexes to MongoDB. This effort includes Tokutek’s interview with Jeff Kelly at Strata as well as my two recent tech blogs which describe the compression achieved on a generic MongoDB data set and performance improvements we measured using on our implementation of Sysbench for MongoDB. I have a full line-up of benchmarks and blogs planned for the next few months, as our project continues. Many of these will be deeply technical and written by the Tokutek developers.

We have a group of evaluators running MongoDB with Fractal Tree Indexes, but more feedback is always better. So …

Do you want to participate in the process of bringing high compression and extreme performance gains to MongoDB? We’re looking for MongoDB experts to test our build on your real-world workloads and benchmarks. Evaluator feedback will be used in creating the product road map. Please email me at tim@tokutek.com if interested.

You keep reading about the performance numbers on MongoDB.

Aren’t you curious if those numbers are true for your use case?

Here’s your opportunity to find out!

CSA: Upgrade Immediately to MongoDB 2.4.1

Monday, March 25th, 2013

CSA: Upgrade Immediately to MongoDB 2.4.1

Alex Popescu advises:

If you are running MongoDB 2.4, upgrade immediately to 2.4.1. Details here.

MongoDB 2.4 Release

Tuesday, March 19th, 2013

MongoDB 2.4 Release

From the webpage:

Developer Productivity

  • Capped Arrays simplify development by making it easy to incorporate fixed, sorted lists for features like leaderboards and logging.
  • Geospatial Enhancements enable new use cases with support for polygon intersections and analytics based on geospatial data.
  • Text Search provides a simplified, integrated approach to incorporating search functionality into apps (Note: this feature is currently in beta release).

Operations

  • Hash-Based Sharding simplifies deployment of large MongoDB systems.
  • Working Set Analyzer makes capacity planning easier for ops teams.
  • Improved Replication increases resiliency and reduces administration.
  • Mongo Client creates an intuitive, consistent feature set across all drivers.

Performance

  • Faster Counts and Aggregation Framework Refinements make it easier to leverage real-time, in-place analytics.
  • V8 JavaScript Engine offers better concurrency and faster performance for some operations, including MapReduce jobs.

Monitoring

  • On-Prem Monitoring provides comprehensive monitoring, visualization and alerting on more than 100 operational metrics of a MongoDB system in real time, based on the same application that powers 10gen’s popular MongoDB Monitoring Service (MMS). On-Prem Monitoring is only available with MongoDB Enterprise.



Security
….

  • Kerberos Authentication enables enterprise and government customers to integrate MongoDB into existing enterprise security systems. Kerberos support is only available in MongoDB Enterprise.
  • Role-Based Privileges allow organizations to assign more granular security policies for server, database and cluster administration.

You can read more about the improvements to MongoDB 2.4 in the Release Notes. Also, MongoDB 2.4 is available for download on MongoDB.org.

Lots to look at in MongoDB 2.4!

But I am curious about the beta text search feature.

MongoDB Text Search: Experimental Feature in MongoDB 2.4 says:

Text search (SERVER-380) is one of the most requested features for MongoDB 10gen is working on an experimental text-search feature, to be released in v2.4, and we’re already seeing some talk in the community about the native implementation within the server. We view this as an important step towards fulfilling a community need.

MongoDB text search is still in its infancy and we encourage you to try it out on your datasets. Many applications use both MongoDB and Solr/Lucene, but realize that there is still a feature gap. For some applications, the basic text search that we are introducing may be sufficient. As you get to know text search, you can determine when MongoDB has crossed the threshold for what you need. (emphasis added)

So, why isn’t MongoDB incorporating Solr/Lucene instead of a home grown text search feature?

Seems like users could leverage their Solr/Lucene skills with their MongoDB installations.

Yes?

Databases & Dragons

Friday, March 8th, 2013

Databases & Dragons by Kristina Chodorow.

From the post:

Here are some exercises to battle-test your MongoDB instance before going into production. You’ll need a Database Master (aka DM) to make bad things happen to your MongoDB install and one or more players to try to figure out what’s going wrong and fix it.

Should be of interest if you are developing MongoDB to go into production.

The idea should also be of interest if you are developing other software to go into production.

Most software (not all) works fine with expected values, other components responding correctly, etc.

But those are the very conditions your software may not encounter in production.

Where’s your “databases &amps dragons” test for your software?

MongoDB + Fractal Tree Indexes = High Compression

Friday, March 1st, 2013

MongoDB + Fractal Tree Indexes = High Compression by Tim Callaghan.

You may have heard that MapR Technologies broke the MinuteSort Record by sorting 15 billion 100-btye records in 60 seconds. Used 2,103 virtual instances in the Google Compute Engine and each instance had four virtual cores and one virtual disk, totaling 8,412 virtual cores and 2,103 virtual disks. Google Compute Engine, MapR Break MinuteSort Record.

So, the next time you have 8,412 virtual cores and 2,103 virtual disks, you know what is possible, ;-)

But if you have less firepower than that, you will need to be clever:

One doesn’t have to look far to see that there is strong interest in MongoDB compression. MongoDB has an open ticket from 2009 titled “Option to Store Data Compressed” with Fix Version/s planned but not scheduled. The ticket has a lot of comments, mostly from MongoDB users explaining their use-cases for the feature. For example, Khalid Salomão notes that “Compression would be very good to reduce storage cost and improve IO performance” and Andy notes that “SSD is getting more and more common for servers. They are very fast. The problems are high costs and low capacity.” There are many more in the ticket.

In prior blogs we’ve written about significant performance advantages when using Fractal Tree Indexes with MongoDB. Compression has always been a key feature of Fractal Tree Indexes. We currently support the LZMA, quicklz, and zlib compression algorithms, and our architecture allows us to easily add more. Our large block size creates another advantage as these algorithms tend to compress large blocks better than small ones.

Given the interest in compression for MongoDB and our capabilities to address this functionality, we decided to do a benchmark to measure the compression achieved by MongoDB + Fractal Tree Indexes using each available compression type. The benchmark loads 51 million documents into a collection and measures the size of all files in the file system (–dbpath).

More benchmarks to follow and you should remember that all benchmarks are just that, benchmarks.

Benchmarks do not represent experience with your data, under your operating load and network conditions, etc.

Investigate software based on the first, purchase software based on the second.

NoSQL is Great, But You Still Need Indexes [MongoDB for example]

Wednesday, February 20th, 2013

NoSQL is Great, But You Still Need Indexes by Martin Farach-Colton.

From the post:

I’ve said it before, and, as is the nature of these things, I’ll almost certainly say it again: your database performance is only as good as your indexes.

That’s the grand thesis, so what does that mean? In any DB system — SQL, NoSQL, NewSQL, PostSQL, … — data gets ingested and organized. And the system answers queries. The pain point for most users is around the speed to answer queries. And the query speed (both latency and throughput, to be exact) depend on how the data is organized. In short: Good Indexes, Fast Queries; Poor Indexes, Slow Queries.

But building indexes is hard work, or at least it has been for the last several decades, because almost all indexing is done with B-trees. That’s true of commercial databases, of MySQL, and of most NoSQL solutions that do indexing. (The ones that don’t do indexing solve a very different problem and probably shouldn’t be confused with databases.)

It’s not true of TokuDB. We build Fractal Tree Indexes, which are much easier to maintain but can still answer queries quickly. So with TokuDB, it’s Fast Indexes, More Indexes, Fast Queries. TokuDB is usually thought of as a storage engine for MySQL and MariaDB. But it’s really a B-tree substitute, so we’re always on the lookout for systems where we can improving the indexing.

Enter MongoDB. MongoDB is beloved because it makes deployment fast. But when you peel away the layers, you get down to a B-tree, with all the performance headaches and workarounds that they necessitate.

That’s the theory, anyway. So we did some testing. We ripped out the part of MongoDB that takes care of secondary indices and plugged in TokuDB. We’ve posted the blogs before, but here they are again, the greatest hits of TokuDB+MongoDB: we show a 10x insertion performance, a 268x query performance, and a 532x (or 53,200% if you prefer) multikey index insertion performance. We also discussed covered indexes vs. clustered Fractal Tree Indexes.

Did somebody declare February 20th to be performance release day?

Did I miss that memo? ;-)

Like every geek, I like faster. But, here’s my question:

Have there been any studies on the impact of faster systems on searching and decision making by users?

My assumption is the faster I get a non-responsive result from a search, the sooner I can improve it.

But that’s an assumption on my part.

Is that really true?

MongoDB Text Search Tutorial

Thursday, January 17th, 2013

MongoDB Text Search Tutorial by Alex Popescu.

From the post:

Today is the day of the experimental MongoDB text search feature. Tobias Trelle continues his posts about this feature providing some examples for query syntax (negation, phrase search)—according to the previous post even more advanced queries should be supported, filtering and projections, multiple text fields indexing, and adding details about the stemming solution used (Snowball).

Alex also has a list of his posts on the text search feature for MongoDB.

MongoDB Puzzlers #1

Sunday, December 30th, 2012

MongoDB Puzzlers #1 by Kristina Chodorow.

If you are not too deeply invested in the fiscal cliff debate, ;-) , you may enjoy the distraction of a puzzler based on the MongoDB query language.

Collecting puzzler’s for MongoDB and other query languages would be a good idea.

Something to be enjoyed in times of national “crisis,” aka, collective hand wringing by the media.

When is “Hello World,” Not “Hello World?”

Sunday, December 30th, 2012

To answer that question, you need to see the post: Travel NoSQL Application – Polyglot NoSQL with SpringData on Neo4J and MongoDB.

Just a quick sample:

 In this Fuse day, Tikal Java group decided to continue its previous Fuse research for NoSQL, but this time from a different point of view – SpringData and Polyglot persistence. We had two goals in this Fuse day: try working with more than one NoSQL in the same application, and also taking advantage of SpringData data access abstractions for NoSQL databases. We decided to take MongoDB and Neo4J as document DB, and Neo4J as graph database and put them behind an existing, classic and well known application – Spring Travel Sample application.

More than the usual “Hello World” example for languages and a bit more than for most applications.

It would be a nice trend to see more robust, perhaps “Hello World+” examples.

What is your enhanced “Hello World+” going to look like in 2013?

Searching an Encrypted Document Collection with Solr4, MongoDB and JCE

Sunday, December 16th, 2012

Searching an Encrypted Document Collection with Solr4, MongoDB and JCE by Sujit Pal.

From the post:

A while back, someone asked me if it was possible to make an encrypted document collection searchable through Solr. The use case was patient records – the patient is the owner of the records, and the only person who can search through them, unless he temporarily grants permission to someone else (for example his doctor) for diagnostic purposes. I couldn’t come up with a good way of doing it off the bat, but after some thought, came up with a design that roughly looked like the picture below:

With privacy being all the rage, a very timely post.

Not to mention an opportunity to try out Solr4.

MongoSV 2012

Wednesday, October 31st, 2012

MongoSV 2012

From the webpage:

December 4th Santa Clara, CA

MongoSV is an annual one-day conference in Silicon Valley, CA, dedicated to the open source, non-relational database MongoDB.

There are five (5) tracks, morning and afternoon sessions, a final session followed by a conference party from 5:30 PM to 8 PM.

Any summary is going to miss something of interest for someone. Take the time to review the schedule.

While you are there, register for the conference as well. A unique annual opportunity to mix-n-meet with MongoDB enthusiasts!

MongoDB and Fractal Tree Indexes (Webinar) [13 November 2012]

Wednesday, October 31st, 2012

Webinar: MongoDB and Fractal Tree Indexes by Tim Callaghan.

From the post:

This webinar covers the basics of B-trees and Fractal Tree Indexes, the benchmarks we’ve run so far, and the development road map going forward.

Date: November 13th
Time: 2 PM EST / 11 AM PST
REGISTER TODAY

If you aren’t familiar with Fractal Tree Indexes and MongoDB, this is your opportunity to catch up!

Online Education- MongoDB and Oracle R Enterprise

Wednesday, October 24th, 2012

Online Education- MongoDB and Oracle R Enterprise by Ajay Ohri.

Ajay brings news of two MongoDB online courses, one for developers and one for DBAs, and an Oracle offering on R.

The MongoDB classes started Monday (22nd of October) so you had better hurry to register.

10gen: Growing the MongoDB world

Thursday, October 18th, 2012

10gen: Growing the MongoDB world by Dj Walker-Morgan.

From the post:

10gen, the company set up by the creators of the open source NoSQL database MongoDB, has been on a roll recently, creating business partnerships with numerous companies, making it a hot commercial proposition without creating any apparent friction with its open source community. So what has brought MongoDB to the fore?

One factor has been how easy it is to get up and running with the database, a feature that the company wants to actively maintain. 10gen president Max Schireson explained: “I think that it’s honestly a combination of the functionality of MongoDB itself, but also the effort that we’ve invested in packaging for the open source community. I see some open source companies taking the approach of ‘oh yeah the code’s open source but you’ll need a PhD to actually get a working build of it unless you are a subscriber’. While that might help monetisation, that’s not a way to build a big community”.

Schireson says the company isn’t going stand still though: although it’s easy to get a single node up and running, over time they want to make it easier to get more complex, sharded, implementations configured and deployed. “As people use more and more functionality, that of necessity brings in more complexity, we’re looking for ways to make that easier,” he says, pointing to the cluster manager being developed as a native part of MongoDB, which should make it easier to manage and upgrade clusters.

Always appreciate a plug for good documentation.

May not work for you but it certainly worked here.

How MongoDB’s Journaling Works

Tuesday, October 9th, 2012

How MongoDB’s Journaling Works by Kristina Chodorow.

From the post:

I was working on a section on the gooey innards of journaling for The Definitive Guide, but then I realized it’s an implementation detail that most people won’t care about. However, I had all of these nice diagrams just laying around.

Well, journaling may be “an implementation detail,” but Kristina explains it well and some “implementation details” shape our views of what is or isn’t possible.

Doesn’t hurt to know more than we when we started reading the post.

Is your appreciation of journaling the same or different after reading Kristina’s post?

Looking for MongoDB users to test Fractal Tree Indexing

Friday, September 14th, 2012

Looking for MongoDB users to test Fractal Tree Indexing by Tim Callaghan.

In my three previous blogs I wrote about our implementation of Fractal Tree Indexes on MongoDB, showing a 10x insertion performance increase, a 268x query performance increase, and a comparison of covered indexes and clustered indexes. The benchmarks show the difference that rich and efficient indexing can make to your MongoDB workload.

It’s one thing for us to benchmark MongoDB + TokuDB and another to measure real world performance. If you are looking for a way to improve the performance or scalability of your MongoDB deployment, we can help and we’d like to hear from you. We have a preview build available for MongoDB v2.2 that you can run with your existing data folder, drop/add Fractal Tree Indexes, and measure the performance differences. Please email me at tim@tokutek.com if interested.

Here is your chance to try these speed improvements out on your data!

MongoDB Index Shootout: Covered Indexes vs. Clustered Fractal Tree Indexes

Friday, September 7th, 2012

MongoDB Index Shootout: Covered Indexes vs. Clustered Fractal Tree Indexes by Tim Callaghan.

From the post:

In my two previous blogs I wrote about our implementation of Fractal Tree Indexes on MongoDB, showing a 10x insertion performance increase and a 268x query performance increase. MongoDB’s covered indexes can provide some performance benefits over a regular MongoDB index, as they reduce the amount of IO required to satisfy certain queries. In essence, when all of the fields you are requesting are present in the index key, then MongoDB does not have to go back to the main storage heap to retrieve anything. My benchmark results are further down in this write-up, but first I’d like to compare MongoDB’s Covered Indexes with Tokutek’s Clustered Fractal Tree Indexes.

MongoDB Covered Indexes Tokutek Clustered Fractal Tree Indexes
Query Efficiency Improved when all requested fields are part of index key Always improved, all non-keyed fields are stored in the index
Index Size Data is not compressed Generally 10x to 20x compression, user selects zlib, quicklz, or lzma. Note that non-clustered indexes are compressed as well.
Planning/Maintenance Index “covers” a fixed set of fields, adding a new field to an existing covered index requires a drop and recreate of the index. None, all fields in the document are always available in the index.

When putting my ideas together for the above table it struck me that covered indexes are really about a well defined schema, yet NoSQL is often thought of as “schema-less”. If you have a very large MongoDB collection and add a new field that you want covered by an existing index, the drop and recreate process will take a long time. On the other hand, a clustered Fractal Tree Index will automatically include this new field so there is no need to drop/recreate unless you need the field to be part of a .find() operation itself.

If you have some time to experiment this weekend, more MongoDB benchmarks/improvements to consider.

268x Query Performance Bump for MongoDB

Sunday, September 2nd, 2012

268x Query Performance Increase for MongoDB with Fractal Tree Indexes, SAY WHAT? by Tim Callaghan.

From the post:

Last week I wrote about our 10x insertion performance increase with MongoDB. We’ve continued our experimental integration of Fractal Tree® Indexes into MongoDB, adding support for clustered indexes. A clustered index stores all non-index fields as the “value” portion of the index, as opposed to a standard MongoDB index that stores a pointer to the document data. The benefit is that indexed lookups can immediately return any requested values instead of needing to do an additional lookup (and potential disk IOs) for the requested fields.

I’m trying to recover from learning about scalable subgraph matching, Efficient Subgraph Matching on Billion Node Graphs [Parallel Graph Processing], and now the nice folks at Tokutek post a 26,816% query performance increase for MongoDB.

They claim to not be MongoDB experts. I guess that’s right. The increase in performance would have been higher. ;-)

Serious question: How long will it take this sort of performance increase to impact the modeling and design of information systems?

And in what way?

With high enough performance, can subject identity be modeled interactively?

MongoDB 2.2 Released [Aggregation News - Expiring Data From Merges?]

Thursday, August 30th, 2012

MongoDB 2.2 Released

From the post:

We are pleased to announce the release of MongoDB version 2.2. This release includes over 1,000 new features, bug fixes, and performance enhancements, with a focus on improved flexibility and performance. For additional details on the release:

Of particular interest to topic map fans:

Aggregation Framework

The Aggregation Framework is available in its first production-ready release as of 2.2. The aggregation framework makes it easier to manipulate and process documents inside of MongoDB, without needing to use Map Reduce, or separate application processes for data manipulation.

See the aggregation documentation for more information.

The H Open also mentions TTL (time to live) which can remove documents from collections.

MongoDB documentation: Expire Data from Collections by Setting TTL.

Have you considered “expiring” data from merges?

MongoDB: Pumping Fractal Iron

Sunday, August 26th, 2012

10x Insertion Performance Increase for MongoDB with Fractal Tree Indexes by Tim Callaghan.

From the post:

The challenge of handling massive data processing workloads has spawned many new innovations and techniques in the database world, from indexing innovations like our Fractal Tree® technology to a myriad of “NoSQL” solutions (here is our Chief Scientist’s perspective). Among the most popular and widely adopted NoSQL solutions is MongoDB and we became curious if our Fractal Tree indexing could offer some advantage when combined with it. The answer seems to be a strong “yes”.

Earlier in the summer we kicked off a small side project and here’s what we did: we implemented a “version 2” IndexInterface as a Fractal Tree index and ran some benchmarks. Note that our integration only affects MongoDB’s secondary indexes; primary indexes continue to rely on MongoDB’s indexing code. All the changes we made to the MongoDB source are available here. Caveat: this was a quick and dirty project – the code is experimental grade so none of it is supported or went through any careful design analysis.

For our initial benchmark we measured the performance of a single threaded insertion workload. The inserted documents contained the following: URI (character), name (character), origin (character), creation date (timestamp), and expiration date (timestamp). We created a total of four secondary indexes: URI, name, origin, and creation date. The point of the benchmark is to insert enough documents such that the indexes are larger than main memory and show the insertion performance from an empty database to one that is largely dependent on disk IO. We ran the benchmark with journaling disabled, then again with journaling enabled.

Not for production use but the performance numbers should give you pause.

A long pause.

Pig as Hadoop Connector, Part One: Pig, MongoDB and Node.js

Thursday, August 16th, 2012

Pig as Hadoop Connector, Part One: Pig, MongoDB and Node.js by Russell Jurney.

From the post:

Series Introduction

Apache Pig is a dataflow oriented, scripting interface to Hadoop. Pig enables you to manipulate data as tuples in simple pipelines without thinking about the complexities of MapReduce.

But Pig is more than that. Pig has emerged as the ‘duct tape’ of Big Data, enabling you to send data between distributed systems in a few lines of code. In this series, we’re going to show you how to use Hadoop and Pig to connect different distributed systems, to enable you to process data from wherever and to wherever you like.

Working code for this post as well as setup instructions for the tools we use are available at https://github.com/rjurney/enron-node-mongo and you can download the Enron emails we use in the example in Avro format at http://s3.amazonaws.com/rjurney.public/enron.avro. You can run our example Pig scripts in local mode (without Hadoop) with the -x local flag: pig -x local. This enables new Hadoop users to try out Pig without a Hadoop cluster.

Introduction

In this post we’ll be using Hadoop, Pig, mongo-hadoop, MongoDB and Node.js to turn Avro records into a web service. We do so to illustrate Pig’s ability to act as glue between distributed systems, and to show how easy it is to publish data from Hadoop to the web.

I was tempted to add ‘duct tape’ as a category. But there could only be one entry. ;-)

Take an early weekend and have some fun with this tomorrow. August will be over sooner than you think.

MongoDB 2.2.0-rc0

Thursday, July 26th, 2012

MongoDB 2.2.0-rc0

The latest unstable release of MongoDB.

Release notes for 2.2.0-rc0.

Among the changes you will find:

  • Aggregation Framework
  • TTL Collections
  • Concurrency Improvements
  • Query Optimizer Improvements
  • Tag Aware Sharding

among others.

MongoDB-as-a-service for private rolled out by ScaleGrid, in MongoDirector

Monday, July 23rd, 2012

MongoDB-as-a-service for private rolled out by ScaleGrid, in MongoDirector by Chris Mayer.

From the post:

Of all the NoSQL databases emerging at the moment, there appears to be one constant discussion taking place – are you using MongoDB?

It appears to be the open source, document-oriented NoSQL database solution of choice, mainly due to its high performance nature, its dynamism and its similarities to the JSON data structure (in BSON). Despite being written in C++, it is attracting attention from developers of different creeds. Its enterprise level features have helped a fair bit in its charge up the rankings to leading NoSQL database, with it being the ideal datastore for highly scalable environments. Just a look at the latest in-demand skills on Indeed.com shows you that 10gen’s flagship product has infiltrated the enterprise well and truly.

Quite often, an enterprise can find the switch from SQL to NoSQL daunting and needs a helping hand. Due to this, many MongoDB-related products are arriving just as quickly as MongoDB converts The latest of which to launch as a public beta is MongoDirector from Seattle start-up ScaleGrid. MongoDirector offers an end-to-end lifecycle manager for MongoDB to guide newcomers along.

I don’t have anything negative to say about MongoDB but I’m not sure the discussion of NoSQL solutions is quite as one-sided as Chris seems to think.

The Indeed.com site is a fun one to play around with but I would not take the numbers all that seriously. For one thing, it doesn’t appear to control for duplicate job ads posted in different source, for example. But that’s a nitpicking objection.

A more serious one is when you start to explore the site and discover the top three job titles for IT.

Care to guess what they are? Would you believe they don’t have anything to do with databases or MongoDB?

As least as of today, and I am sure it changes over time, Graphic Designer, Technical Writer, and Project Manager all rank higher than Data Analyst, where you would hope to find some MongoDB jobs. (Information Technology Industry – 23 July 2012)

BTW, for your amusement, when I was looking for information on database employment, I encountered Database Administrators, from the Bureau of Labor Statistics in the United States. The data is available for download as XLS files.

The site says blanks on the maps are from lack of data. I suspect the truth is there are no database administrators in Wyoming. ;-) Or at least I could point to the graphic as some evidence for my claim.

I think you need to consider the range of database options, from very traditional SQL vendors to bleeding edge No/New/Maybe/SQL solutions, including MongoDB. The question is which one meets your requirements, whether flavor of the month or no.

Real-time Twitter heat map with MongoDB

Thursday, July 12th, 2012

Real-time Twitter heat map with MongoDB

From the post:

Over the last few weeks I got in touch with the fascinating field of data visualisation which offers great ways to play around with the perception of information.

In a more formal approach data visualisation denotes “The representation and presentation of data that exploits our visual perception abilities in order to amplify cognition

Nowadays there is a huge flood of information that hit’s us everyday. Enormous amounts of data collected from various sources are freely available on the internet. One of these data gargoyles is Twitter producing around 400 million (400 000 000!) tweets per day!

Tweets basically offer two “layers” of information. The obvious direct information within the text of the Tweet itself and also a second layer that is not directly perceived which is the Tweets’ metadata. In this case Twitter offers a large number of additional information like user data, retweet count, hashtags, etc. This metadata can be leveraged to experience data from Twitter in a lot of exciting new ways!

So as a little weekend project I have decided to build a small piece of software that generates real-time heat maps of certain keywords from Twitter data.

Yes, “…in a lot of exciting new ways!” +1!

What about maintenance issues on such a heat map? The capture of terms to the map is fairly obvious, but a subsequent user may be left in the dark as to why this term and not some other term? Or some then current synonym for a term that is being captured?

Or imposing semantics on tweets or terms that are unexpected or non-obvious to a casual or not so casual observer.

You and I can agree red means go and green means stop in a tweet. That’s difficult to maintain as the number of participants and terms go up.

A great starting place to experiment with topic maps to address such issues.

I first saw this in the NoSQL Weekly Newsletter.

MongoDB Installer for Windows Azure

Tuesday, July 10th, 2012

MongoDB Installer for Windows Azure by Doug Mahugh.

From the post:

Do you need to build a high-availability web application or service? One that can scale out quickly in response to fluctuating demand? Need to do complex queries against schema-free collections of rich objects? If you answer yes to any of those questions, MongoDB on Windows Azure is an approach you’ll want to look at closely.

People have been using MongoDB on Windows Azure for some time (for example), but recently the setup, deployment, and development experience has been streamlined by the release of the MongoDB Installer for Windows Azure. It’s now easier than ever to get started with MongoDB on Windows Azure!

If you are developing or considering developing with MongoDB, this is definitely worth a look. In part because it frees you to concentrate on software development and not running (or trying to run) a server farm. Different skill sets.

Another reason is that is levels the playing field with big IT firms with server farms. You get the advantages of a server farm without the capital investment in one.

And as Microsoft becomes a bigger and bigger tent for diverse platforms and technologies, you have more choices. Choices for the changing requirements of your clients.

Not that I expect to see an Apple hanging from the Microsoft tree anytime soon but you can’t ever tell. Enough consumer demand and it could happen.

In the meantime, while we wait for better games and commercials, consider how you would power semantic integration in the cloud?

Implementing Aggregation Functions in MongoDB

Tuesday, June 26th, 2012

Implementing Aggregation Functions in MongoDB by Arun Viswanathan and Shruthi Kumar.

From the post:

With the amount of data that organizations generate exploding from gigabytes to terabytes to petabytes, traditional databases are unable to scale up to manage such big data sets. Using these solutions, the cost of storing and processing data will significantly increase as the data grows. This is resulting in organizations looking for other economical solutions such as NoSQL databases that provide the required data storage and processing capabilities, scalability and cost effectiveness. NoSQL databases do not use SQL as the query language. There are different types of these databases such as document stores, key-value stores, graph database, object database, etc.

Typical use cases for NoSQL database includes archiving old logs, event logging, ecommerce application log, gaming data, social data, etc. due to its fast read-write capability. The stored data would then require to be processed to gain useful insights on customers and their usage of the applications.

The NoSQL database we use in this article is MongoDB which is an open source document oriented NoSQL database system written in C++. It provides a high performance document oriented storage as well as support for writing MapReduce programs to process data stored in MongoDB documents. It is easily scalable and supports auto partitioning. Map Reduce can be used for aggregation of data through batch processing. MongoDB stores data in BSON (Binary JSON) format, supports a dynamic schema and allows for dynamic queries. The Mongo Query Language is expressed as JSON and is different from the SQL queries used in an RDBMS. MongoDB provides an Aggregation Framework that includes utility functions such as count, distinct and group. However more advanced aggregation functions such as sum, average, max, min, variance and standard deviation need to be implemented using MapReduce.

This article describes the method of implementing common aggregation functions like sum, average, max, min, variance and standard deviation on a MongoDB document using its MapReduce functionality. Typical applications of aggregations include business reporting of sales data such as calculation of total sales by grouping data across geographical locations, financial reporting, etc.

Not terribly advanced but enough to get you started with creating aggregation functions.

Includes “testing” of the aggregation functions that are written in the article.

If Python is more your cup of tea, see: Aggregation in MongoDB (part1) and Aggregation in MongoDB (part 2).