by Thomas Cabrol.
From part 1:
Graph analysis becomes a key component of data science. A lot of things can be modeled as graphs, but social networks are really one of the most obvious examples.
In this post, I am going to show how one could visualize its own LinkedIn graph, using the LinkedIn API and Gephi, a very nice software for working on this type of data. If you don’t have it yet, just go to http://gephi.org/ and download it now !
My objective is to simply look at my connections (the “nodes” or “vertices” of the graph), see how they relate to each other (the “edges”) and find clusters of strongly connected users (“communities”). This is somewhat emulating what is available already in the InMaps data product, but, hey, this is cool to do it by ourselves, no ?
The first thing to do for running this graph analysis is to be able to query LinkedIn via its API. You really don’t want to get the data by hand… The API uses the oauth authentification protocol, which will let an application make queries on behalf of a user. So go to https://www.linkedin.com/secure/developer and register a new application. Fill the form as required, and in the OAuth part, use this redirect URL for instance:
Great introduction to Gephi!
As a bonus, reinforces the lesson that ETL isn’t required to re-use data.
ETL may be required in some cases but in a world of data APIs those are getting fewer and fewer.
Think of it this way: Non-ETL data access means someone else is paying for maintenance, backups, hardware, etc.
How much of your IT budget is supporting duplicated data?