Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 30, 2014

Graphs, Databases and Graphlab

Filed under: GraphLab,Graphs,IMDb,Python — Patrick Durusau @ 2:40 pm

Graphs, Databases and Graphlab by Bugra Akyildiz.

From the post:

I will talk about graphs, graph databases and mainly the paper that powers Graphlab. At the end of the post, I will go over briefly basic capabilities of Graphlab as well.

Background coverage of graphs and graphdatabases, followed by a discussion of GraphLab.

The high point of the post are graphs generated from prior work by Bugra on the Internet Movie Database. (IMDB Top 100K Movies Analysis in Depth (Parts 1- 4))

Enjoy!

March 9, 2014

IMDB Top 100K Movies Analysis in Depth (Parts 1- 4)

Filed under: Graphics,IMDb,Visualization — Patrick Durusau @ 2:27 pm

IMDB Top 100K Movies Analysis in Depth Part 1 by Bugra Akyildiz.

IMDB Top 100K Movies Analysis in Depth Part 2

IMDB Top 100K Movies Analysis in Depth Part 3

IMDB Top 100K Movies Analysis in Depth Part 4

From part 1:

Data is from IMDB and it includes all of the popularly voted 100042 movies from 1950 to 2013.(I know why 100000 is there but have no idea how 42 movies get squeezed. Instead of blaming my web scraping skills, I blame the universe, though).

The reason why I chose the number of votes as a metric to order the movies is because, generally the information (title, certificate, outline, director and so on) about movie are more likely to be complete for the movies that have high number of votes. Moreover, IMDB uses number of votes as a metric to determine the ranking as well so number of votes also correlate with the rating as well. Further, everybody at least has an idea on IMDB Top 250 or IMDB Top 1000 which are ordered by the ratings computed by IMDB.

Although the data is quite rich in terms of basic information, only year, rating and votes are complete for all of the movies. Only ~80% of the movies have runtime information(minutes). The categories are mostly 90% complete which could be considered good but the certificate information of the movies is the most sparse (only ~25% of them have it).

This post aims to explore data for diffferent aspects of data(categories, rating and categories) and also useful information(best movie in terms of rating or votes for each year).

An interesting analysis of the Internet Movie Database (IMDB) that incorporates other sources, such as for revenue and actors’ and actresses’ age and height information.

Suggestions on other data to include or representation techniques?

I first saw this in a tweet by Gregory Piatetsky.

February 2, 2012

IMDb Alternative Interfaces

Filed under: Data,Dataset,IMDb — Patrick Durusau @ 3:39 pm

IMDb Alternative Interfaces.

From the webpage:

This page describes various alternate ways to access The Internet Movie Database locally by holding copies of the data directly on your system. See more about using our data on the Non-Commercial Licensing page.

It’s an interesting data set and I am sure its owners would not mind your sending them a screencast of some improved access you have created to their data.

That might actually be an interesting model for developing better interfaces to data served up to the public anyway. Release it for strictly personal use and see who does the best job with it. A screencast would not disclose any of your source code or processes, protecting the interest of the software author.

Just a thought.

First noticed this on PeteSearch.

Powered by WordPress