Map-D (the details)

MIT Spinout Exploits GPU Memory for Vast Visualization by Alex Woodie.

From the post:

An MIT research project turned open source project dubbed the Massively Parallel Database (Map-D) is turning heads for its capability to generate visualizations on the fly from billions of data points. The software—an SQL-based, column-oriented database that runs in the memory of GPUs—can deliver interactive analysis of 10TB datasets with millisecond latencies. For this reason, its creator feels comfortable is calling it “the fastest database in the world.”

Map-D is the brainchild of Todd Mostak, who created the software while taking a class in database development at MIT. By optimizing the database to run in the memory of off-the-shelf graphics processing units (GPUs), Mostak found that he could create a mini supercomputer cluster that offered an order of magnitude better performance than a database running on regular CPUs.

“Map-D is an in-memory column store coded into the onboard memory of GPUs and CPUs,” Mostak said today during Webinar on Map-D. “It’s really designed from the ground up to maximize whatever hardware it’s using, whether it’s running on Intel CPU or Nvidia GPU. It’s optimized to maximize the throughput, meaning if a GPU has this much memory bandwidth, what we really try to do is make sure we’re hitting that memory bandwidth.”

During the webinar, Mostak and Tom Graham, his fellow co-founder of the startup Map-D, demonstrated the technology’s capability to interactively analyze datasets composed of a billion individual records, constituting more than 1TB of data. The demo included a heat map of Twitter posts made from 2010 to the present. Map-D’s “TweetMap” (which the company also demonstrated at the recent SC 2013 conference) runs on eight K40 Tesla GPUs, each with 12 GB of memory, in a single node configuration.

You really need to try the TweetMap example. This rocks!

The details on TweetMap:

You can search tweet text, heatmap results, identify and animate trends, share maps and regress results against census data.

For each click Map-D scans the entire database and visualizes results in real-time. Unlike many other tweetmap demos, nothing is canned or pre-rendered. Recent tweets also stream live onto the system and are available for view within seconds of broadcast.

TweetMap is powered by 8 NVIDIA Tesla K40 GPUs with a total of 96GB of GPU memory in a single node. While we sometimes switch between interesting datasets of various size, for the most part TweetMap houses over 1 billion tweets from 2010 to the present.

Imagine interactive “merging” of subjects based on their properties.

Come to think of it, don’t GPUs handle edges between nodes? As in graphs? 😉

A couple of links for more information, although I suspect the list of resources on Map-D is going to grow by leaps and bounds:

Resources page (included videos of demonstrations).

An Overview of MapD (Massively Parallel Database) by Todd Mostak. (whitepaper)

Comments are closed.