Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 1, 2012

Handsome Atlas: Beautiful Data Visualizations from the 19th Century

Filed under: Graphics,Visualization — Patrick Durusau @ 6:35 pm

Handsome Atlas: Beautiful Data Visualizations from the 19th Century

Information Aesthetics reports:

Do you want some inspiration to create a visually stunning – yet fully optimized – data graphic? Well, let’s go back about a 140 years… Handsome Atlas [handsomeatlas.com], developed by Jonathan Soma of Brooklyn Brainery, provides a stunning new online interface to a large collection of beautiful data visualizations from the 19th century. While all the visualizations shown can already be found in some long list at the US Census website, this website is specifically designed so to encourage you to explore, investigate and enjoy.

Which visualizations can you imagine reusing?

For what data sets?

Pig Macro for TF-IDF Makes Topic Summarization 2 Lines of Pig

Filed under: Pig,TF-IDF — Patrick Durusau @ 6:20 pm

Pig Macro for TF-IDF Makes Topic Summarization 2 Lines of Pig by Russell Jurney.

From the post:

In a recent post we used Pig to summarize documents via the Term-Frequency, Inverse Document Frequency (TF-IDF) algorithm.

In this post, we’re going to turn that code into a Pig macro that can be called in one line of code:

Any Pig macros in your trick bag?

Visualization for Scientific Discovery

Filed under: Graphics,Visualization — Patrick Durusau @ 5:58 pm

Visualization for Scientific Discovery by Nathan Yau.

Nathan points to a recent SciAM piece by Jeffrey Heer on visualization. Sounds worthwhile.

My only puzzle over visualization is why we are surprised that it works so well?

We have all had the experience of drawing diagrams to explain things, either to ourselves or others.

Crude but it is a form of visualization.

I suppose my question should be:

Why don’t we use visualization more playfully?

Just try it out.

May lead to something. May lead to nothing.

Won’t know unless we try.

Clojure Koans

Filed under: Clojure — Patrick Durusau @ 4:39 pm

Clojure Koans

You may find this useful.

Note that you get a direct link to the site. I don’t try to trap you in a frame for my site.

I first saw this at DZone.

PDS – Planetary Data System [The Mother Lode]

Filed under: Astroinformatics,Data — Patrick Durusau @ 4:35 pm

PDS – Planetary Data System

From the webpage:

The PDS archives and distributes scientific data from NASA planetary missions, astronomical observations, and laboratory measurements. The PDS is sponsored by NASA’s Science Mission Directorate. Its purpose is to ensure the long-term usability of NASA data and to stimulate advanced research

Tools, data, guides, etc.

Quick searches include:

  • Mercury
  • Venus
  • Mars
  • Jupiter
  • Saturn
  • Uranus, Neptune, Pluto
  • Rings
  • Asteroids
  • Comets
  • Planetary Dust
  • Earth’s Moon
  • Solar Wind

The ordering here makes a little more sense to me. What about you?

A nice way to teach scientific, mathematical and computer literacy without making it seem like work. 😉

Planetary Data System – Geosciences Node

Filed under: Astroinformatics,Data,Geographic Data — Patrick Durusau @ 3:22 pm

Sounds like SciFi, yes? SciFi? No!

After seeing Google add some sea bed material to Google Maps, I started to wonder about radar based maps of other places. Like the Moon.

I remember the excitement Ranger 7 images generated. And that in grainy newspaper reproductions.

With just a little searching, I came across PDS (Planetary Data Services) Geosciences Node (Washington University in St. Louis).

From the web page:

The Geosciences Node of NASA’s Planetary Data System (PDS) archives and distributes digital data related to the study of the surfaces and interiors of terrestrial planetary bodies. We work directly with NASA missions to help them generate well-documented, permanent data archives. We provide data to NASA-sponsored researchers along with expert assistance in using the data. All our archives are online and available to the public to download free of charge.

Which includes:

  • Mars
  • Venus
  • Mercury
  • Moon
  • Earth (test data for other planetary surfaces)
  • Asteroids
  • Gravity Models

Even after checking the FAQ, I can’t explain the ordering of these entries. Order from the Sun doesn’t work. Neither does order or distance from Earth. Nor alphabetical sort order. Suggestions?

In any event, enjoy the data set!

Google Maps Goes Deep-Sea Diving to Chart the World’s Ocean Floors

Filed under: Mapping,Maps — Patrick Durusau @ 2:47 pm

Google Maps Goes Deep-Sea Diving to Chart the World’s Ocean Floors by David Gianatasio.

A quick blurb about Google Maps adding select sea beds to its map collection.

Suggestions on what other ocean floor data is commonly available?

And with data in hand, what other data would you merge it with?

Apple Maps: By the “big data” short hairs

Filed under: BigData,Mapping,Maps — Patrick Durusau @ 10:48 am

Mike Loukides in Apple’s maps: Apple’s maps problem isn’t about software or design. It’s about data nails the problem with Apple Maps. It’s the data stupid!

Here’s the difficulty. As Stephen O’Grady has pointed out, the problem with maps is really a data problem, not a software or design problem. If Apple’s maps app was ugly or had a poor user interface, it would be fixed within a month. But Apple is really looking at a data problem: bad data, incomplete data, conflicting data, poor quality data, incorrectly formatted data. Anyone who works with data understands that 80% of the work in any data product is getting your data into good enough shape so that it’s useable. Google is a data company, and they understand this; hence the reports of more than 7,000 people working on Google Maps. And even Google Maps has its errors; I just reported a “road” that is really just a poorly maintained trail.

Mike’s post is amusing and informative so be sure to read it.

But remember these two points:

  1. Data is always dirty, syntactically and/or semantically. “Big data” is “big dirty data.”
  2. Google has 7,000 people, not servers, clusters, algorithms, etc., working on Google Maps. (Is that evidence that “big dirty data” requires human correction?)

The bigger the data, the more dirt you will encounter.

Is your data application going to be the next “Apple Maps?”

Scikit-learn 0.12 released

Filed under: Machine Learning,Python,Scikit-Learn — Patrick Durusau @ 10:31 am

Scikit-learn 0.12 released by Andreas Mueller.

From the post:

Last night I uploaded the new version 0.12 of scikit-learn to pypi. Also the updated website is up and running and development now starts towards 0.13.

The new release has some nifty new features (see whatsnew):

  • Multidimensional scaling
  • Multi-Output random forests (like these)
  • Multi-task Lasso
  • More loss functions for ensemble methods and SGD
  • Better text feature extraction

Eventhough, the majority of changes in this release are somewhat “under the hood”.

Vlad developed and set up a continuous performance benchmark for the main algorithms during his google summer of code. I am sure this will help improve performance.

There already has been a lot of work in improving performance, by Vlad, Immanuel, Gilles and others for this release.

Just in case you haven’t been keeping up with Scikit-learn.

Troll Detection with Scikit-Learn

Filed under: Machine Learning,Python,Scikit-Learn — Patrick Durusau @ 9:52 am

Troll Detection with Scikit-Learn by Andreas Mueller.

I had thought that troll detection was one of those “field guide” sort of things:

troll dolls

After reading Andreas’ post, apparently not. 😉

From the post:

Cross-post from Peekaboo, Andreas Mueller‘s computer vision and machine learning blog. This post documents his experience in the Impermium Detecting Insults in Social Commentary competition, but rest of the blog is well worth a read, especially for those interested in computer vision and Python scikit-learn and -image.

Recently I entered my first kaggle competition – for those who don’t know it, it is a site running machine learning competitions. A data set and time frame is provided and the best submission gets a money prize, often something between 5000$ and 50000$.

I found the approach quite interesting and could definitely use a new laptop, so I entered Detecting Insults in Social Commentary.

My weapon of choice was Python with scikit-learn – for those who haven’t read my blog before: I am one of the core devs of the project and never shut up about it.

During the competition I was visiting Microsoft Reseach, so this is where most of my time and energy went, in particular in the end of the competition, as it was also the end of my internship. And there was also the scikit-learn release in between. Maybe I can spent a bit more time on the next competition.

Disco [Erlang/Python – MapReduce]

Filed under: Disco,Erlang,MapReduce,Python — Patrick Durusau @ 9:16 am

Disco

From the webpage:

Disco is a distributed computing framework based on the MapReduce paradigm. Disco is open-source; developed by Nokia Research Center to solve real problems in handling massive amounts of data.

Disco is powerful and easy to use, thanks to Python. Disco distributes and replicates your data, and schedules your jobs efficiently. Disco even includes the tools you need to index billions of data points and query them in real-time.

Install Disco on your laptop, cluster or cloud of choice and become a part of the Disco community!

I rather like the MapReduce graphic you will see at About.

I first saw this in Guido Kollerie’s post on the recent Python users meeting in the Netherlands. Guido details his 5 minute presentation on Disco.

« Newer Posts

Powered by WordPress