Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 2, 2012

Hacking Chess with the MongoDB Pipeline

Filed under: Aggregation,MongoDB — Patrick Durusau @ 3:46 pm

Hacking Chess with the MongoDB Pipeline

Kristina Chodorow* writes:

MongoDB’s new aggegation framework is now available in the nightly build! This post demonstrates some of its capabilities by using it to analyze chess games.

Make sure you have a the “Development Release (Unstable)” nightly running before trying out the stuff in this post. The aggregation framework will be in 2.1.0, but as of this writing it’s only in the nightly build.

First, we need some chess games to analyze. Download games.json, which contains 1132 games that were won in 10 moves or less (crush their soul and do it quick).

You can use mongoimport to import games.json into MongoDB:

If you think this example of “aggregation” as merging where the subjects have a uniform identifier (chess piece/move), you will understand why I find this interesting.

Aggregation, as is shown by Kristina’s post, can form the basis for analysis of data.

Analysis that isn’t possible in the absence of aggregation (read merging).

I am looking forward to addition posts on the aggregation framework and need to drop by the MongoDB project to see what the future holds on aggregation/merging.

*Kristina is the author of two O’Reilly titles, MongoDB: the definitive guide and Scaling MongoDB.

January 30, 2012

PHP and MongoDB Tutorial

Filed under: MongoDB,PHP — Patrick Durusau @ 8:00 pm

PHP and MongoDB Tutorial

Presentation by Derick Rethans on MongoDB and PHP. Walks through the most common aspects of using PHP with MongoDB.

From myNoSQL.

January 27, 2012

Analytics with MongoDB (commercial opportunity here)

Filed under: Analytics,Data,Data Analysis,MongoDB — Patrick Durusau @ 4:35 pm

Analytics with MongoDB

Interesting enough slide deck on analytics with MongoDB.

Relies on custom programming and then closes with this punchline (along with others, slide #41):

  • If you’re a business analyst you have a problem
    • better be BFF with some engineer 🙂

I remember when word processing required a lot of “dot” commands and editing markup languages with little or no editor support. Twenty years (has it been that long?) later and business analysts are doing word processing, markup and damned near print shop presentation without working close to the metal.

Can anyone name any products that have made large sums of money making it possible for business analysts and others to perform those tasks?

If so, ask yourself if you would like to have a piece of the action that frees business analysts from script kiddie engineers?

Even if a general application is out of reach at present, imagine writing access routines for common public data sites.

Create a market for the means to import and access particular data sets.

January 25, 2012

Berlin Buzzwords 2012

Filed under: BigData,Conferences,ElasticSearch,Hadoop,HBase,Lucene,MongoDB,Solr — Patrick Durusau @ 3:24 pm

Berlin Buzzwords 2012

Important Dates (all dates in GMT +2)

Submission deadline: March 11th 2012, 23:59 MEZ
Notification of accepted speakers: April 6st, 2012, MEZ
Publication of final schedule: April 13th, 2012
Conference: June 4/5. 2012

The call:

Call for Submission Berlin Buzzwords 2012 – Search, Store, Scale — June 4 / 5. 2012

The event will comprise presentations on scalable data processing. We invite you to submit talks on the topics:

  • IR / Search – Lucene, Solr, katta, ElasticSearch or comparable solutions
  • NoSQL – like CouchDB, MongoDB, Jackrabbit, HBase and others
  • Large Data Processing – Hadoop itself, MapReduce, Cascading or Pig and relatives

Related topics not explicitly listed above are more than welcome. We are looking for presentations on the implementation of the systems themselves, technical talks, real world applications and case studies.

…(moved dates to top)…

High quality, technical submissions are called for, ranging from principles to practice. We are looking for real world use cases, background on the architecture of specific projects and a deep dive into architectures built on top of e.g. Hadoop clusters.

Here is your chance to experience summer in Berlin (Berlin Buzzwords 2012) and in Montreal (Balisage).

Seriously, both conferences are very strong and worth your attention.

January 24, 2012

MongoDB Indexing in Practice

Filed under: Indexing,MongoDB — Patrick Durusau @ 3:36 pm

MongoDB Indexing in Practice

From the post:

With the right indexes in place, MongoDB can use its hardware efficiently and serve your application’s queries quickly. In this article, based on chapter 7 of MongoDB in Action, author Kyle Banker talks about refining and administering indexes. You will learn how to create, build and backup MongoDB indexes.

Indexing is closely related to topic maps and the more you learn about them, the better topic maps you will be writing.

Take for example the treatment of “multiple keys” in this post.

What that means is that multiple entries in an index can point at the same document.

Not that big of a step to multiple ways to identify the the same subject.

Granting that in Kyle’s example, none of his “keys” really identify the subject. More isa, usedWith, usedIn type associations.

The Little Redis Book

Filed under: MongoDB,Redis — Patrick Durusau @ 3:35 pm

The Little Redis Book by Karl Seguin.

Weighs in at 29 pages and does a good job of creating an interest in knowing more about Redis.

Seguin is also the author of The Little MongoDB Book. (which comes in at 32 pages)

January 11, 2012

MongoGraph One Ups MongoDB With Semantic Power (Humor)

Filed under: AllegroGraph,MongoDB,MongoGraph — Patrick Durusau @ 8:05 pm

MongoGraph One Ups MongoDB With Semantic Power by Jennifer Zaino.

From the post:

But Franz Inc. proposes an alternative for those who want more sophisticated functionality: Use the semantic power of its AllegroGraph Web 3.0 database to deal with complicated queries, via MongoGraph, a MongoDB API to AllegroGraph technology.

So, MongoGraph “One Ups” MongoDB by copying their API?

If MongoDB is as difficult to use as the article implies, wouldn’t that copying be going the other way?

Heard of anyone copying the Franz API lately?

Certainly not MongoDB. 😉

PS: As MongoDB points out: http://www.mongodb.org/display/DOCS/MongoDB+Data+Modeling+and+Rails, there are things that MongoDB does better than others. (shrugs) That is true for all technologies. At least MongoDB is up front about it.

December 17, 2011

SQL to MongoDB: An Updated Mapping

Filed under: Aggregation,MongoDB,NoSQL — Patrick Durusau @ 7:52 pm

SQL to MongoDB: An Updated Mapping from Kristina Chodorow.

From the post:

The aggregation pipeline code has finally been merged into the main development branch and is scheduled for release in 2.2. It lets you combine simple operations (like finding the max or min, projecting out fields, taking counts or averages) into a pipeline of operations, making a lot of things that were only possible by using MapReduce doable with a “normal” query.

In celebration of this, I thought I’d re-do the very popular MySQL to MongoDB mapping using the aggregation pipeline, instead of MapReduce.

If you are interested in MongoDB-based solutions, this will be very interesting.

December 4, 2011

Mongoid_fulltext

Filed under: MongoDB,N-Grams — Patrick Durusau @ 8:16 pm

Mongoid_fulltext: full-text n-gram search for your MongoDB models by Daniel Doubrovkine.

From the post:

We’ve been using mongoid_search for sometime now for auto-complete. It’s a fine component that splits sentences and uses MongoDB to index them. Unfortunately it doesn’t rank them, so results come in order of appearance. In contrast, mongoid-fulltext uses n-gram matching (with n=3 right now), so we index all of the substrings of length 3 from text that we want to search on. If you search for “damian hurst”, mongoid_fulltext does lookups for “dam”, “ami”, “mia”, “ian”, “an “, “n h”, ” hu”, “hur”, “urs”, and “rst” and combines the results to get a most likely match. This also means users can make simple spelling mistakes and still find something relevant. In addition, you can index multiple collections in a single index, producing best matching results within several models. Finally, mongoid-fulltext leverages MongoDB native indexing and map-reduce.

And see: https://github.com/aaw/mongoid_fulltext.

Might want to think about this for your next text input by user application.

December 2, 2011

Useful Mongo Resources for NoSQL Newbs

Filed under: MongoDB,NoSQL — Patrick Durusau @ 4:54 pm

Useful Mongo Resources for NoSQL Newbs

Michael Robinson has a small but useful collection of resources to introduce users to NoSQL and in particular MongoDB.

If you know of other resources Michael should be listing, give him a shout!

December 1, 2011

Seven Databases in Seven Weeks now in Beta

Filed under: CouchDB,HBase,MongoDB,Neo4j,PostgreSQL,Redis,Riak — Patrick Durusau @ 7:41 pm

Seven Databases in Seven Weeks now in Beta

From the webpage:

Redis, Neo4J, Couch, Mongo, HBase, Riak, and Postgres: with each database, you’ll tackle a real-world data problem that highlights the concepts and features that make it shine. You’ll explore the five data models employed by these databases: relational, key/value, columnar, document, and graph. See which kinds of problems are best suited to each, and when to use them.

You’ll learn how MongoDB and CouchDB, both JavaScript powered, document oriented datastores, are strikingly different. Learn about the Dynamo heritage at the heart of Riak and Cassandra. Understand MapReduce and how to use it to solve Big Data problems.

Build clusters of servers using scalable services like Amazon’s Elastic Compute Cloud (EC2). Discover the CAP theorem and its implications for your distributed data. Understand the tradeoffs between consistency and availability, and when you can use them to your advantage. Use multiple databases in concert to create a platform that’s more than the sum of its parts, or find one that meets all your needs at once.

Seven Databases in Seven Weeks will give you a broad understanding of the databases, their strengths and weaknesses, and how to choose the ones that fit your needs.

Now in beta, in non-DRM PDF, epub, and mobi from pragprog.com/book/rwdata.

If you know the Seven Languages in Seven Weeks by Bruce Tate, no further recommendation is necessary for the approach.

I haven’t read the book, yet, but will be getting the electronic beta tonight. More to follow.

November 8, 2011

Someone Is Being Honest on the Internet?

Filed under: MongoDB,NoSQL,Riak — Patrick Durusau @ 7:44 pm

After seeing the raft of Twitter traffic on MongoDB and Riak, In Context (and an apology), I just had to look. The thought of someone being honest on the Internet being even more novel than someone being wrong on the Internet.

At least I would not have to stay up late correcting them. 😉

Sean Cribbs writes:

There has been quite a bit of furor and excitement on the Internet this week regarding some very public criticisms (and defenses) of MongoDB and its creators, 10gen. Unfortunately, a ghost from my recent past also resurfaced as a result. Let me begin by apologizing to 10gen and its engineers for what I said at JSConf, and then I will reframe my comments in a more constructive form.

Mea culpa. It’s way too easy in our industry to set up and knock down strawmen, as I did, than to convey messages of objective and constructive criticism. It’s also too easy, when you are passionate about what you believe in, to ignore the feelings and efforts of others, which I did. I have great respect for the engineers I have met from 10gen, Mathias Stern and Kyle Banker. They are friendly, approachable, helpful and fun to socialize with at conferences. Thanks for being stand-up guys.

Also, whether we like it or not, these kinds of public embarrassments have rippling effects across the whole NoSQL ecosystem. While Basho has tried to distance itself from other players in the NoSQL field, we cannot deny our origins, and the ecosystem as a “thing” is only about 3 years old. Are developers, technical managers and CTOs more wary of new database technologies as a result of these embarrassments? Probably. Should we continue to work hard to develop and promote alternative data-storage solutions? Absolutely.

Sean’s following comments are useful but even more useful was his suggestion that both MongoDB and Riak push to improve their respective capabilities. There is always room for improvement.

Oh, I did notice on thing that needs correcting in Sean’s blog entry. 😉 See: Munnecke, Heath Records and VistA (NoSQL 35 years old?) NoSQL is at least 35 years old, probably longer but I don’t have the citation at hand.

November 3, 2011

NoSQL Exchange – 2 November 2011

NoSQL Exchange – 2 November 2011

It doesn’t get much better or fresher (for non-attendees) than this!

  • Dr Jim Webber of Neo Technology starts the day by welcoming everyone to the first of many annual NOSQL eXchanges. View the podcast here…
  • Emil Eifrém gives a Keynote talk to the NOSQL eXchange on the past, present and future of NOSQL, and the state of NOSQL today. View the podcast here…
  • HANDLING CONFLICTS IN EVENTUALLY CONSISTENT SYSTEMS In this talk, Russell Brown examines how conflicting values are kept to a minimum in Riak and illustrates some techniques for automating semantic reconciliation. There will be practical examples from the Riak Java Client and other places.
  • MONGODB + SCALA: CASE CLASSES, DOCUMENTS AND SHARDS FOR A NEW DATA MODEL Brendan McAdams — creator of Casbah, a Scala toolkit for MongoDB — will give a talk on “MongoDB + Scala: Case Classes, Documents and Shards for a New Data Model”
  • REAL LIFE CASSANDRA Dave Gardner: In this talk for the NOSQL eXchange, Dave Gardner introduces why you would want to use Cassandra, and focuses on a real-life use case, explaining each Cassandra feature within this context.
  • DOCTOR WHO AND NEO4J Ian Robinson: Armed only with a data store packed full of geeky Doctor Who facts, by the end of this session we’ll have you tracking down pieces of memorabilia from a show that, like the graph theory behind Neo4j, is older than Codd’s relational model.
  • BUILDING REAL WORLD SOLUTION WITH DOCUMENT STORAGE, SCALA AND LIFT Aleksa Vukotic will look at how his company assessed and adopted CouchDB in order to rapidly and successfully deliver a next generation insurance platform using Scala and Lift.
  • ROBERT REES ON POLYGLOT PERSISTENCE Robert Rees: Based on his experiences of mixing CouchDB and Neo4J at Wazoku, an idea management startup, Robert talks about the theory of mixing your stores and the practical experience.
  • PARKBENCH DISCUSSION This Park Bench discussion will be chaired by Jim Webber.
  • THE FUTURE OF NOSQL AND BIG DATA STORAGE Tom Wilkie: Tom Wilkie takes a whistle-stop tour of developments in NOSQL and Big Data storage, comparing and contrasting new storage engines from Google (LevelDB), RethinkDB, Tokutek and Acunu (Castle).

And yes, I made a separate blog post on Neo4j and Dr. Who. 😉 What can I say? I am a fan of both.

October 21, 2011

Using MongoDB in Anger

Filed under: Database,Indexing,MongoDB,NoSQL — Patrick Durusau @ 7:26 pm

Using MongoDB in Anger

Tips on building high performance applications with MongoDB.

Covers four topics:

  • Schema design
  • Indexing
  • Concurrency
  • Durability

Excellent presentation!

One of the first presentations I have seen that recommends a book about another product. Well, High Performance MySQL and MongoDB in Action.

October 20, 2011

Getting Started with MMS

Filed under: MongoDB — Patrick Durusau @ 6:41 pm

Getting Started with MMS by Kristina Chodorow.

From the post:

Telling someone “You should set up monitoring” is kind of like telling someone “You should exercise 20 minutes three times a week.” Yes, you know you should, but your chair is so comfortable and you haven’t keeled over dead yet.

For years*, 10gen has been planning to do monitoring “right,” making it painless to monitor your database. Today, we released the MongoDB Monitoring Service: MMS.

MMS is free hosted monitoring for MongoDB. I’ve been using it to help out paying customers for a while, so I thought I’d do a quick post on useful stuff I’ve discovered (documentation is… uh… a little light, so far).

MongoDB folks will find this post quite useful.

October 14, 2011

MongoGraph – MongoDB Meets the Semantic Web

Filed under: MongoDB,RDF,Semantic Web,SPARQL — Patrick Durusau @ 6:24 pm

MongoGraph – MongoDB Meets the Semantic Web

From the post (Franz Inc.):

Recorded Webcast: MongoGraph – MongoDB Meets the Semantic Web From October 12, 2011

MongoGraph is an effort to bring the Semantic Web to MongoDB developers. We implemented a MongoDB interface to AllegroGraph to give Javascript programmers both Joins and the Semantic Web. JSON objects are automatically translated into triples and both the MongoDB query language and SPARQL work against your objects.

Join us for this webcast to learn more about working on the level of objects instead of individual triples, where an object would be defined as all the triples with the same subject. We’ll discuss the simplicity of the MongoDB interface for working with objects and all the properties of an advanced triplestore, in this case joins through SPARQL queries, automatic indexing of all attributes/values, ACID properties all packaged to deliver a simple entry into the world of the Semantic Web.

I haven’t watched the video, yet, but:

working on the level of objects instead of individual triples, where an object would be defined as all the triples with the same subject.

certainly caught my eye.

Curious, if this means simply using the triples as sources of values and not “reasoning” with them?

September 29, 2011

MongoDB Monitoring Service (MMS)

Filed under: MongoDB,SaaS — Patrick Durusau @ 6:37 pm

MongoDB Monitoring Service (MMS)

From the post:

Today we’re pleased to release the MongoDB Monitoring Service (MMS) to the public for free. MMS is a SaaS based tool that monitors your MongoDB cluster and makes it easy for you to see what’s going on in a production deployment.

One of the most frequently asked questions we get from users getting ready to deploy MongoDB is “What should I be monitoring in production?” Our engineers have spent a lot of time working with many of the world’s largest MongoDB deployments and based on this experience, MMS represents our current “best practices” monitoring for any deployment.

MMS is free. Anybody can sign up for MMS, download the agent, and start visualizing performance data in minutes.

If you’re a commercial support customer, MMS makes our support even better. 10gen engineers can access your MMS data, enabling them to skip the tedious back and forth information gathering that can go on during triaging of issues.

I’m don’t have access to a MongoDB cluster (at least not at the moment). Comments on MMS most welcome.

Given the feature/performance race in NoSQL solutions, should be interesting to see what monitoring solutions appear for other NoSQL offerings!

Or for SQL offerings as well!

September 21, 2011

MongoUK – September 2011

Filed under: MongoDB — Patrick Durusau @ 7:07 pm

MongoUK – September 2011

A full day with three (3) tracks on MongoDB.

Just some titles at random to awake your interest: Indexes, What Indexes?, Scaling MongoDB for Real-Time Analytics, Intelligent Stream Filtering Using MongoDB, GeoSpatial Indexing. That 4 out of 25.

My suggestion is that you visit and find presentations relevant to your topic map interests. Enjoy!

BTW, another 20 or so presentations on MongoDB from the MongoUK event in March, 2011.

September 19, 2011

The Joy of Indexing

Filed under: Indexing,MongoDB — Patrick Durusau @ 7:55 pm

The Joy of Indexing by Kyle Banker.

From the post:

We spend quite a lot of time at 10gen supporting MongoDB users. The questions we receive are truly legion but, as you might guess, they tend to overlap. We get frequent queries on sharding, replica sets, and the idiosyncrasies of JavaScript, but the one subject that never fails to appear each day on our mailing list is indexing.

Now, to be clear, I’m not talking about how to create an index. That’s easy. The trouble runs much deeper. It’s knowing how indexes work and having the intuition to create the best indexes for your queries and your data set. Lacking this intuition, your production database will eventually slow to a crawl, you’ll upgrade your hardware in vain, and when all else fails, you’ll blame both gods and men.

This need not be your fate. You can understand indexing! All that’s required is the right mental model, and over the course of this series, that’s just what I hope to provide.

But caveat emptor: what follows is a thought experiment. To get the most out of this post, you can’t skim it. Read every word. Use your imagination. Think through the quizzes. Do this, and your indexing struggles may soon be no more.

Very useful post and one that anyone starting to create indexes by automated means needs to read.

Curious how readers with a background in indexing feel about the description?

What would you instruct a reader to do differently if they were manually creating an index to this cookbook?

September 15, 2011

5 Steps to Scaling MongoDB

Filed under: Database,MongoDB — Patrick Durusau @ 7:52 pm

5 Steps to Scaling MongoDB (Or Any DB) in 8 MInutes

From the post:

Jared Rosoff concisely, effectively, entertainingly, and convincingly gives an 8 minute MongoDB tutorial on scaling MongoDB at Scale Out Camp. The ideas aren’t just limited to MongoDB, they work for most any database: Optimize your queries; Know your working set size; Tune your file system; Choose the right disks; Shard. Here’s an explanation of all 5 strategies:

Note: The Scale Out Camp link isn’t working as of 9/14/2011. Web domain is there but no content.

August 31, 2011

Interactive Maps With Polymaps, TileStach, and MongoDB

Filed under: Maps,MongoDB — Patrick Durusau @ 7:38 pm

Interactive Maps With Polymaps, TileStach, and MongoDB

For the impatient: Checkout Interactive Map of Twitter Weight Loss Goals (very slick)

From Alex Popescu’s myNoSQL:

A three part tutorial on using MongoDB, PostgreSQL/PostGIS, and Javascript libraries for building interactive maps by Hans Kuder:

  • part 1: goals and building blocks
  • part 2: geo data, PostGIS, and TileStache
  • part 3: client side and MongoDB

Visiting part 1 for a larger taste of the project you find:

I’d been toying around with ideas for cool ancillary features for Goalfinch for a while, and finally settled on creating this interactive map of Twitter weight loss goals. I knew what I wanted: a Google-maps-style, draggable, zoomable, slick-looking map, with the ability to combine raster images and style-able vector data. And I didn’t want to use Flash. But as a complete geographic information sciences (GIS) neophyte, I had no idea where to start. Luckily there are some new technologies in this area that greatly simplified this project. I’m going to show you how they all fit together so you can create your own interactive maps for the browser.

Overview

The main components of the weight loss goals map are:

  1. Client-side Javascript that assembles the map from separate layers (using Polymaps)
  2. Server-based application that provides the data for each layer (TileStache, MongoDB, PostGIS, Pylons)
  3. Server-based Python code that runs periodically to search Twitter and update the weight loss goal data

I’ll cover each component separately in upcoming posts, but I’ll start with a high-level description of how the components work together for those of you who are new to web-based interactive maps.

Let your imagination run wild with the interactive maps that you can assemble and populate with topic map based data.

August 30, 2011

MongoDB 2.0.0-rc0

Filed under: MongoDB,NoSQL — Patrick Durusau @ 7:10 pm

MongoDB 2.0.0-rc0 was released 25 August 2011.

Check out the latest release or download a stable version at:

MongoDB homepage

August 18, 2011

How You Should Go About Learning NoSQL

Filed under: Dynamo,MongoDB,NoSQL,Redis — Patrick Durusau @ 6:46 pm

How You Should Go About Learning NoSQL

Interesting post that expands on three rules for learning NoSQL:

1: Use MongoDB.
2: Take 20 minute to learn Redis
3: Watch this video to understand Dynamo.

August 11, 2011

The joy of algorithms and NoSQL: a MongoDB example (part 2)

Filed under: Algorithms,Cheminformatics,MapReduce,MongoDB — Patrick Durusau @ 6:35 pm

The joy of algorithms and NoSQL: a MongoDB example (part 2)

From the post:

In part 1 of this article, I described the use of MongoDB to solve a specific Chemoinformatics problem, namely the computation of molecular similarities. Depending on the target Tanimoto coefficient, the MongoDB solution is able to screen a database of a million compounds in subsecond time. To make this possible, queries only return chemical compounds which, in theory, are able to satisfy the particular target Tanimoto. Even though this optimization is in place, the number of compounds returned by this query increases significantly when the target Tanimoto is lowered. The example code on the GitHub repository for instance, imports and indexes ~25000 chemical compounds. When a target Tanimoto of 0.8 is employed, the query returns ~700 compounds. When the target Tanimoto is lowered to 0.6, the number of returned compounds increases to ~7000. Using the MongoDB explain functionality, one is able to observe that the internal MongoDB query execution time increases slightly, compared to the execution overhead to transfer the full list of 7000 compounds to the remote Java application. Hence, it would make more sense to perform the calculations local to where the data is stored. Welcome to MongoDB’s build-in map-reduce functionality!

Screening “…millions of compounds in subsecond time” sounds useful in a topic map context.

July 29, 2011

MongoDB Schema Design Basics

Filed under: MongoDB,NoSQL,Schema — Patrick Durusau @ 7:46 pm

MongoDB Schema Design Basics

From Alex Popescu’s myNoSQL:

For NoSQL databases there are no clear rules like the Boyce-Codd Normal Form database normalization. Data modeling and analysis of data access patterns are two fundamental activities. While over the last 2 years we’ve gather some recipes, it’s always a good idea to check what are the recommended ways to model your data with your choice of NoSQL database.

After the break, watch 10gen’s Richard Kreuter’s presentation on MongoDB schema design.

A must see video!

July 24, 2011

MongoDB and the Democratic Party

Filed under: MongoDB,NoSQL — Patrick Durusau @ 6:46 pm

MongoDB and the Democratic Party – A Case Study by Pramod Sadalage.

Interesting case study for an application that managed contacts of the Democratic Party (US) for fund raising and voter turnout efforts on election day.

Talks about elimination of duplicate records but given the breath of the talk, the speaker doesn’t go into any detail.

Pay particular attention to the data structure that is created for this project.

Note that any organization can have a different ID for any particular person. That is a local organization can query by its identifier and its ID for a person. And it gets back the information on that person. (I assume the IDs used by other organizations is filtered out of the return.)

Granted it isn’t aggregation of unbounded information for any particular voter from an unknown number of sources but it is a low cost solution to the national ID (for this data set) and providing access via local IDs problem. That “pattern” could prove to be useful in other cases.

July 14, 2011

MapReduce with MongoDB and Clojure

Filed under: Clojure,MapReduce,MongoDB — Patrick Durusau @ 4:12 pm

MapReduce with MongoDB and Clojure

From the post:

A few days ago, we decided to create a dashboard in order to better visualize some statistics of our production systems. One important function is to plot the average latency as a time-series graph, so we can see the trend over time. Since MongoDB implemented MapReduce, and we store our logs in MongoDB, MapReduce seems a natural fit for log analysis.

One issue with MongoDB’s implementation of MapReduce is that no matter what language you use, you have to pass JavaScript code as strings to MongoDB. Storing code written in another language as strings in a program is … inelegant, to say the least.

Fortunately, Clojure being a homoiconic language, it is relatively easy to transform Clojure forms into code snippets of other languages using Clojure itself in the same program. In other words, it is possible to embed JavaScript programs in a Clojure program without actually seeing any JavaScript syntax. There are already a number of libraries, with different level of maturity, that allow you to transform Clojure forms to JavaScript. I haven’t done an extensive survey, but ClojureJS is good enough for our purpose.

Emphasis on homoiconic nature of Clojure.

July 7, 2011

MongoSV

Filed under: MongoDB,NoSQL — Patrick Durusau @ 4:16 pm

MongoSV

From the homepage:

MongoSV was a four-track, one-day conference on December 3, 2010 at Microsoft Research Silicon Valley in Mountain View, CA. The main conference track featured 10gen founders Dwight Merriman and Eliot Horowitz, as well as Roger Bodamer, the head of 10gen’s west coast operations, and several of the key engineers developing the MongoDB project. These sessions were geared towards developers and administrators interested in learning how to use the database, with sessions on schema design, indexing, administration, deployment strategies, scaling, and other features. A second track showcased several high-profile deployments of the database at Shutterfly, Craigslist, IGN, Intuit, Wordnik, and more. For more experienced users of the database, there were several advanced sessions, covering the storage engine, replication, sharding, and consistency models.

Excellent collection of videos and slides on MongoDB and various aspects of its use.

Wordnick

Filed under: Graphs,MongoDB,Subject Identity — Patrick Durusau @ 4:15 pm

Wordnick – Building a Directed Graph with MongoDB

Tony Tam slide deck on directed graphs and MongoDB.

Emphasizes what graph you build depends on your application needs. Much like using your tests for subject identity. You could always use mine but never quite as well or as accurately as your own.

July 1, 2011

Explore MongoDB

Filed under: MongoDB — Patrick Durusau @ 2:46 pm

Explore MongoDB by Joe Lennon (IBM).

From the summary:

In this article, you will learn about MongoDB, the open source, document-oriented database management system written in C++ that provides features for scaling your databases in a production environment. Discover what benefits document-oriented databases have over traditional relational database management systems (RDBMS). Install MongoDB and start creating databases, collections, and documents. Examine Mongo’s dynamic querying features, which provide key/value store efficiency in a way familiar to RDBMS database administrators and developers.

Great way to get your feet wet with MongoDB!

« Newer PostsOlder Posts »

Powered by WordPress