Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 20, 2012

Node.js, Neo4j, and usefulness of hacking ugly code [Normalization as Presentation?]

Filed under: Graphs,Neo4j,Networks,node-js — Patrick Durusau @ 8:16 pm

Node.js, Neo4j, and usefulness of hacking ugly code by Justin Mandzik.

From the post:

My primary application has a ton of data, even in its infancy. Hundreds of millions of distinct entities (and growing fast), each with many properties, and many relationships. Numbers in the billions start to be really easy to hit, and then thats still not accounting for organic growth. Most of the data is hierarchical for now, but theres a need in the near term for arbitrary relationships and the quick traversing thereof. Vanilla MySQL in particular is annoying to work when it comes to hierarchical data. Moving to Oracle gets us some nicer toys to play with (CONNECT_BY_ROOT and such), but ultimately, the need for a complimentary database solution emerges.

NOSQL bake-off

While my non-relational db experience is limited to MongoDB (which I love dearly), a graph db seemed to be the better theoretical fit. Requirements: Manage dense, interconnected data that has to be traversed fast, a query language that supports a root cause analysis use case, and some kind of H.A. plan of attack. Signals of Neo4j, OrientDB, and Titan started emerging from the noise. Randomly, I started in with Neo4j with the intent of repeating the test cases on the other contenders assuming any of the 3 met the requirements (in theory, at least). Neo4j has a GREAT “2 minutes to get up and running” experience. Untar, bin/neo4j start, and go to localhost:7474 and you’re off and running. A decent interface waits for you and you can dive right in.

Proof of concept code for testing Neo4j with project data.

The presumption of normalization in Neo4j continues to nag at me.

The broader the reach for data, the less likely normalization is going to be possible, or affordable if possible in some theoretical sense.

It may be that normalization is a presentation aspect of results. Will have to think about that over the holidays.

December 17, 2012

Visualizing Facebook Friends With D3.js…

Filed under: D3,Graphs,Networks,Visualization — Patrick Durusau @ 5:30 am

Visualizing Facebook Friends With D3.js or “How Wolfram|Alpha Does That Cool Friend Network Graph” by Tony Young.

From the post:

A while ago, Wolfram|Alpha got the ability to generate personal analytics based on your Facebook profile. It made some cool numbers and stuff, but the friend network graph was the most impressive:

clustering of friends

Wolfram|Alpha neatly separates your various social circles into clusters, based on proximity — with freaky accuracy.

With the awesome D3.js library, along with some gratuitous abuse of the Facebook API, we can make our own!

If you’re impatient, skip through all this text and check out the example or the screenshot!

A good example of the ease of deduplication (read merging) where the source of ids is uniform.

Possible classroom exercise to create additional Facebook accounts for students, so that each student has at least two (2) Facebook accounts. Each with friend lists.

Any overlapping friends will “merge” but the different accounts don’t, even though the same person.

Walk through solving the merging problem where there are different accounts.

I first saw this in a tweet by Christophe Viau.

December 16, 2012

OrgOrgChart: The Dynamic Organization of an Organization

Filed under: Graphs,Networks,Visualization — Patrick Durusau @ 9:14 pm

OrgOrgChart: The Dynamic Organization of an Organization by Andrew Vande Moere.

From the post:

The Organic Organization Chart [autodeskresearch.com], developed by Justin Matejka at Autodesk Research, shows what a Human Resources manager dreams at night.

The animated force-directed network diagram shows how a company’s structure evolves over time, here the daily organizational changes within the company Autodesk over the last 4 years.

The entire hierarchy of AutoDesk is constructed as a single tree with each employee represented by a circle, and a line connecting each employee with his or her manager. Larger circles represent managers with more employees working under them.

Occurs to me that a similar diagram could be useful in tracking the flow of information from one person to another. By adding in email, phone and observed personal contacts.

Say from an internal briefing to a leak, for example.

Impressive demonstration of changes over time. Very impressive.

December 14, 2012

Structure and Dynamics of Information Pathways in Online Media

Filed under: Information Flow,Information Theory,Networks,News,Social Networks — Patrick Durusau @ 6:16 am

Structure and Dynamics of Information Pathways in Online Media by Manuel Gomez Rodriguez, Jure Leskovec, Bernhard Schölkopf.

Abstract:

Diffusion of information, spread of rumors and infectious diseases are all instances of stochastic processes that occur over the edges of an underlying network. Many times networks over which contagions spread are unobserved, and such networks are often dynamic and change over time. In this paper, we investigate the problem of inferring dynamic networks based on information diffusion data. We assume there is an unobserved dynamic network that changes over time, while we observe the results of a dynamic process spreading over the edges of the network. The task then is to infer the edges and the dynamics of the underlying network.

We develop an on-line algorithm that relies on stochastic convex optimization to efficiently solve the dynamic network inference problem. We apply our algorithm to information diffusion among 3.3 million mainstream media and blog sites and experiment with more than 179 million different pieces of information spreading over the network in a one year period. We study the evolution of information pathways in the online media space and find interesting insights. Information pathways for general recurrent topics are more stable across time than for on-going news events. Clusters of news media sites and blogs often emerge and vanish in matter of days for on-going news events. Major social movements and events involving civil population, such as the Libyan’s civil war or Syria’s uprise, lead to an increased amount of information pathways among blogs as well as in the overall increase in the network centrality of blogs and social media sites.

A close reading of this paper will have to wait for the holidays but it will be very near the top of the stack!

Transient subjects anyone?

December 11, 2012

Music Network Visualization

Filed under: Graphs,Music,Networks,Similarity,Subject Identity,Visualization — Patrick Durusau @ 7:23 pm

Music Network Visualization by Dimiter Toshkov.

From the post:

My music interests have always been rather, hmm…, eclectic. Somehow IDM, ambient, darkwave, triphop, acid jazz, bossa nova, qawali, Mali blues and other more or less obscure genres have managed to happily co-exist in my music collection. The sheer diversity always invited the question whether there is some structure to the collection, or each genre is an island of its own. Sounds like a job for network visualization!

Now, there are plenty of music network viz applications on the web. But they don’t show my collection, and just seem unsatisfactory for various reasons. So I decided to craft my own visualization using R and igraph.

Interesting for the visualization but also the use of similarity measures.

The test for identity of a subject, particularly collective subjects, artists “similar” to X, is as unlimited as your imagination.

December 9, 2012

Neo4j 1.9 M02 – Under the Hood

Filed under: Graphs,Neo4j,Networks — Patrick Durusau @ 8:28 pm

Neo4j 1.9 M02 – Under the Hood by Peter Neubauer.

From the post:

We have been working hard over the last weeks to tune and improve many aspects in the Neo4j internals, to deliver an even faster, more stable and less resource intensive graph database in this 1.9.M02 milestone release. Those efforts span a lot of areas that benefit everyone from the typical developer to sysops and to most other Neo4j users.

We are thrilled about the feedback we got from customers, and our community via Google GroupStack Overflow and Twitter. Thanks for helping us improve.

While the new changes might not be visible at the first glance, let’s look into Neo4j’s engine room to see what has changed.

Everyone’s most beloved query language, Cypher, has matured a lot thanks to Jake and Andres’ incredible work. They have made query execution much faster, for most use-cases, while utilizing less memory. The lazy execution of queries has sneaked away lately, so Andres caught it and put it back in. That means you can run queries with potentially infinitely large result sets without exhausting memory. Especially when streaming results (no aggregation and ordering) it will use only a tiny fraction of your memory. The very frequent construct ORDER BY … LIMIT … now benefits from a better top-n-select algorithm. These latest improvements are closing the performance gap to the core-API even more. We’ve also glimpsed a new internal SPI, that will allow Cypher to run even faster in the future.

Peter gives a quick tour of improvements in the latest milestone release of Neo4j.

Suggest you download the latest version to experiment with while you read Peter’s post.

December 8, 2012

GraphLab vs. Piccolo vs. Spark

Filed under: GraphLab,Graphs,Networks,Piccolo,Spark — Patrick Durusau @ 7:26 pm

GraphLab vs. Piccolo vs. Spark by Danny Bickson.

From the post:

I got an interesting case study from Cui Henggang, a first year graduate student at CMU Parallel Data Lab. Cui implemented GMM on GraphLab, for comparing its performance to Piccolo and Spark. His collaborators on this projects where Jinliang Wei and Wei Dai. The algorithm is described on Chris Bishop, Pattern Recognition and Machine Learning, Chapter 9.2, page 438.

Danny reports Chu will be releasing his report and posting his GMM code to the graphic models toolkit (GraphLab).

I will post a pointer when the report appears, here and probably in a new post as well.

December 7, 2012

Building graphs with Hadoop

Filed under: GraphBuilder,Graphs,Hadoop,Networks — Patrick Durusau @ 8:00 pm

Building graphs with Hadoop

From the post:

Faced with a mass of unstructured data, the first step of analysing it should be to organise it, and the first step of that process should be working out in what way it should be organised. But then that mass of data has to be fed into the graph which can take a long time and may be inefficient. That’s why Intel has announced the release of the open source GraphBuilder library, a tool that is meant to help scientists and developers working with large amounts of data build applications that make sense of this data.

The library plugs into Apache Hadoop and is designed to create graphs from big data sets which can then be used in applications. GraphBuilder is written in Java using the MapReduce parallel programming model and takes care of many of the complexities of graph construction. According to the developers, this makes it easier for scientists and developers who do not necessarily have skills in distributed systems engineering to make use of large data sets in their Hadoop applications. They can focus on writing the code that breaks the data up into meaningful nodes and useful edge information which can be run across the distributed architecture where the library also performs a wide range of other useful processes to optimise the data for later analysis.

A nice way to re-use those Hadoop skills you have been busy acquiring!

Definitely on the weekend schedule!

Wikiweb [Clue to a topic map delivery interface]

Filed under: Graphs,Interface Research/Design,Networks,Visualization — Patrick Durusau @ 7:32 pm

Wikiweb

I don’t have an iPhone or IPad so I have to take the video at face value. 🙁

But, what it shows was quite impressive!

Still not convinced about graph layouts that move about but obviously some users really like them.

Imagine this display adapted to merged subject representatives. With configurable display of other subjects/connections.

Now that would rock!

Reconstruct Gene Networks Using Shiny

Filed under: Graphs,Networks,R — Patrick Durusau @ 6:21 pm

Reconstruct Gene Networks Using Shiny by Jeff Allen

From the post:

We’ve been experimenting with RStudio’s new Shiny software as a way to quickly and easily create interactive, responsive web applications which are able to leverage complicated analytics back-ends built in the R programming language.

(graphic omitted)

We created a simple interface which can infer the structure of an underlying Gene Regulatory information based on gene expression patterns; the application is available at http://glimmer.rstudio.com/qbrc/grn/. (At the time of writing, Shiny’s file upload functionality is highly unstable and may not work from your machine — hopefully improvements to the project will resolve the issues shortly.)

Inferring networks. Sounds like inferring associations. (Being mindful of Eric Freese’s demo years ago of the family tree topic map application.)

When looking for connections, consider dropping an association in a map to see what may/may not realign. Or change the basis for the association.

For example, tracking lobbyists in the coming frenzy of tax and fiscal reform.

November 29, 2012

Social Network Analysis (Mathematica 9)

Filed under: Mathematica,Networks,Social Networks — Patrick Durusau @ 7:05 pm

Social Network Analysis (Mathematica 9)

From the webpage:

Drawing on Mathematica‘s strong graph and network capabilities, Mathematica 9 introduces a complete and rich set of state-of-the art social network analysis functions. Access to social networks from a variety of sources, including directly from social media sites, and high level functions for community detection, cohesive groups, centrality, and similarity measures make performing network analysis tasks easier and more flexible than ever before.

Too many features on networks to list.

I now have one item on my Christmas wish list. 😉

How about you?

I first saw this in a tweet by Julian Bilcke.

November 26, 2012

More on #sandy social interactions [100K Tweets/GraphInsight]

Filed under: Graphs,Networks,Visualization — Patrick Durusau @ 7:32 pm

More on #sandy social interactions

From the post:

We collected #sandy tweets for a few hours on Tuesday. Dots are Twitter users and connections are retweets. This network connects more than 100,000 users. There are many small disconnected components. The main cluster contains interesting patterns to explore..

Another highly visual post. You need to see the images to get a sense of the exploration of the data.

November 25, 2012

FluxMap: visual exploration of flux distributions in biological networks [Import/Graphs]

Filed under: Bioinformatics,Biomedical,Graphs,Networks,Visualization — Patrick Durusau @ 2:29 pm

FluxMap: visual exploration of flux distributions in biological networks.

From the webpage:

FluxMap is an easy to use tool for the advanced visualisation of simulated or measured flux data in biological networks. Flux data import is achieved via a structured template basing on intuitive reaction equations. Flux data is mapped onto any network and visualised using edge thickness. Various visualisation options and interaction possibilities enable comparison and visual analysis of complex experimental setups in an interactive way.

Manuals and tutorials here.

Another easy to create graphs from data application. This one importing spreadsheet based data.

Wonder why some highly touted commercial graph databases don’t offer the same ease of use?

November 21, 2012

Large Steam network visualization with Google Maps + Gephi

Filed under: Gephi,Graphs,Networks,Visualization — Patrick Durusau @ 7:33 am

Large Steam network visualization with Google Maps + Gephi

From the post:

I’ve used Google Maps API to visualize a relatively large network collected from Steam Community members. The data is collected from public player profiles that Valve reveals through their Steam Web API. For each player their links to friends and links to Steam Groups they belong are collected. This creates a social network which can be visualized using Gephi.

Graph consists of 212600 nodes and 4045203 edges. Before filtering outliers and low/high degree nodes there are approximately 800 000 groups and over 11 million users.

Very impressive visualization.

Enjoy!

November 10, 2012

Algorithmic Economics

Filed under: Algorithms,Game Theory,Networks,Social Networks — Patrick Durusau @ 1:28 pm

Algorithmic Economics, August 6-10, 2012, Carnegie Mellon University.

You will find slides and videos for:

Another view of social dynamics. Which is everywhere when you think about it. Not just consumers but sellers, manufacturers, R&D.

There isn’t any human activity separate and apart from social dynamics or influenced by them.

That includes the design, authoring and marketing of topic maps.

I first saw this in a tweet from Stefano Bertolo, mentioning the general link and also the lecture on game theory.

November 9, 2012

“Drug Deal” Network Analysis with Gephi (Tutorial)

Filed under: Gephi,Graphs,Networks,Visualization — Patrick Durusau @ 3:12 pm

“Drug Deal” Network Analysis with Gephi (Tutorial) by A. J. Hirst.

A.J. reviews Even Wholesale Drug Dealers Can Use a Little Retargeting: Graphing, Clustering & Community Detection in Excel and Gephi, suggests that you read it before continuing, and then reviews how to use Gephi to converse with the drug dealer data set.

Good tutorial on Gephi and just as good on “conversing” with the data.

November 6, 2012

Network visualization in R with the igraph package

Filed under: Graphs,igraph,Networks,R — Patrick Durusau @ 11:56 am

Network visualization in R with the igraph package by Dimiter Toshkov.

From the post:

In this post I showed a visualization of the organizational network of my department. Since several people asked for details how the plot has been produced, I will provide the code and some extensions below. The plot has been done entirely in R (2.14.01) with the help of the igraph package. It is a great package but I found the documentation somewhat difficult to use, so hopefully this post can be a helpful introduction to network visualization with R.

If you find the igraph package documentation suboptimal, this will give you a leg up on using the package.

Impressive results await you.

Two of the more important reasons to use network visualization as an exploration tool for data:

  1. To construct a useful network visualization you have to slow down and carefully consider the data and the relationships you want to represent. Simply “knowing” the data better, whatever technique helps you slow down to do that, is a good one.
  2. The visualization itself may help you see relationships that are missing or relationships that were unexpected. Either clearing out your assumptions about the data or reducing the noise level in the data

Others?

Examples?

Full power to the Neo4j engines, Mr. Scott!

Filed under: Cypher,Graphs,Java,Neo4j,Networks — Patrick Durusau @ 11:42 am

René’s title: “Get the full neo4j power by using the Core Java API for traversing your Graph data base instead of Cypher Query Language“, makes you appreciate why René’s day job is “computer scientist” and not “ad copy writer.” 😉

René compares working with Neo4j via:

  • Java Core API
  • Traverser Framework
  • Cypher Query Language

And that is the order of their performance, from fastest to slowest:

  • Java Core API – Order of magnitude faster than Cypher
  • Traverser Framework – 25% slower than Java Core
  • Cypher Query Language – Slowest

Order of magnitude improvements tend to attract the attention of commercial customers and those with non-trivial data sets.

That is if you need performance today, not someday.

November 5, 2012

Ubigraph

Filed under: Graphs,Networks,Ubigraph,Visualization — Patrick Durusau @ 5:37 pm

Ubigraph

Play the 36 second video on the home page. If you aren’t interested by the end of the video, you aren’t interested in graph processing.

Alpha software but impressive alpha software!

I grabbed the UbiGraph-alpha-0.2.4-Linux64-Ubuntu-8.04.tgz version, followed the directions you will find at Downloads and got:

bin/ubigraph_server: error while loading shared libraries: libglut.so.3: cannot open shared object file: No such file or directory

I am running Ubuntu 12.04.

This thread from foldit was helpful. About the seventh post down says:

2) the library I needed (libglut.so.3) was gotten using “sudo apt-get install freeglut3”

I knew I needed libglut.so.3 but here is where I find the name of the package that will show up in a search.

Loaded up aptitude and installed Freeglut.

Back to the command line in a fresh terminal window, and it works! (At least the Python demo anyway.)

Now to look at the manual, now that its running. 😉

November 2, 2012

G-Store: A Storage Manager for Graph Data

Filed under: G-Store (graphs),Graphs,Networks — Patrick Durusau @ 6:33 pm

G-Store: A Storage Manager for Graph Data by Dan Olteanu, Robin Steinhaus, Tim Furche and Emanuel Ferm.

From the webpage:

Many modern applications are based on graph data. Social networks, for instance, are based on graphs that describe relationships among people. Paths of disease outbreaks form a graph, as do airline routes, and citations among academic papers. These graphs often contain a massive amount of data.

Relational databases are today’s system of choice for storing and querying large amounts of data. Their use has been backed up by decades of research and commercially successful database management systems such as Oracle, IBM DB2, and PostgreSQL. Relational databases are extremely fast in finding, filtering, inserting, deleting, and updating information in a database table. Joining information from several tables, on the other hand, often takes orders of magnitude longer.

From a theoretical point of view, the relational database model is not a good fit for representing highly interconnected data. Regardless, as businesses and processes around the world get more interconnected, relational databases are increasingly used for storing such data. Perhaps one reason is the lack of a suitable, stable, and well-supported alternative.

G-Store is a prototype of a storage manager for large vertex-labeled graphs. G-Store exploits the structure of the graph to derive a data placement on disk that is optimized for access patterns found in graph queries. The placement strategy is based on a multilevel algorithm that partitions the graph into pages and arranges these pages on disk to minimize the distance on disk between adjacent vertices. G-Store has a built-in query engine that supports depth-first traversal, reachability testing, shortest path search, and shortest path tree search.

Reported to be approximately 12,000 lines of C/C++ code and runs on Windows.

I have not installed Windows on my Ubuntu box so I haven’t tried the software.

Interested if you do, what comments you have. (Particularly on the query language.) Thanks!

October 28, 2012

Visualizing Networks: Beyond the Hairball

Filed under: Graphics,Networks,Visualization — Patrick Durusau @ 9:13 am

Visualizing Networks: Beyond the Hairball by Lynn Cherny.

Impressive slide set on visualizing networks that concludes with a good set of additional resources.

The sort of slide set that makes you regret not seeing the presentation live.

Mentions J. Bertin’s Semiology of Graphics. It is back in print if you have a serious interests using graphics for communication.

I first saw this in a tweet by Peter Neubauer.

October 26, 2012

Information Diffusion on Twitter by @snikolov

Filed under: Gephi,Graphs,Networks,Pig,Tweets — Patrick Durusau @ 6:33 pm

Information Diffusion on Twitter by @snikolov by Marti Hearst.

From the post:

Today Stan Nikolov, who just finished his masters at MIT in studying information diffusion networks, walked us through one particular theoretical model of information diffusion which tries to predict under what conditions an idea stops spreading based on a network’s structure (from the popular Easley and Kleinberg Network book). Stan also gathered a huge amount of Twitter data, processed it using Pig scripts, and graphed the results using Gephi. The video lecture below shows you some great visualizations of the spreading behavior of the data!

(video omitted)

The slides in his Lecture Notes let you see the Pig scripts in more detail.

Another deeply awesome lecture from Marti’s class on Twitter and big data.

Also an example of the level of analysis that a Twitter stream will need to withstand to avoid “imperial entanglements.”

October 24, 2012

The Data Science Community on Twitter

Filed under: Data Science,Graphs,Networks,Social Networks,Tweets,Visualization — Patrick Durusau @ 2:07 pm

The Data Science Community on Twitter

From the webpage:

659 Twitter accounts linked to data science, May 2012.

Linkage of Twitter accounts to display followers and following nodes.

That sounds so inadequate (and is).

You need to go see the page, play with it and then come back.

How was that? Impressive yes?

OK, how would that experience be different if you were using a topic map?

More/less information? Other display options?

It is an impressive piece of eye candy but I have a sense it could be so much more.

You?

October 16, 2012

Core & Peel Algorithm

Filed under: Graphs,Networks,Subgraphs — Patrick Durusau @ 5:04 am

Detecting dense communities in large social and information networks with the Core & Peel algorithm by Marco Pellegrini, Filippo Geraci, Miriam Baglioni.

Abstract:

Detecting and characterizing dense subgraphs (tight communities) in social and information networks is an important exploratory tool in social network analysis. Several approaches have been proposed that either (i) partition the whole network into clusters, even in low density region, or (ii) are aimed at finding a single densest community (and need to be iterated to find the next one). As social networks grow larger both approaches (i) and (ii) result in algorithms too slow to be practical, in particular when speed in analyzing the data is required. In this paper we propose an approach that aims at balancing efficiency of computation and expressiveness and manageability of the output community representation. We define the notion of a partial dense cover (PDC) of a graph. Intuitively a PDC of a graph is a collection of sets of nodes that (a) each set forms a disjoint dense induced subgraphs and (b) its removal leaves the residual graph without dense regions. Exact computation of PDC is an NP-complete problem, thus, we propose an efficient heuristic algorithms for computing a PDC which we christen Core and Peel. Moreover we propose a novel benchmarking technique that allows us to evaluate algorithms for computing PDC using the classical IR concepts of precision and recall even without a golden standard. Tests on 25 social and technological networks from the Stanford Large Network Dataset Collection confirm that Core and Peel is efficient and attains very high precison and recall.

Great name for an algorithm, marred somewhat by the long paper title.

If subgraphs or small groups in a network are among your subjects, take the time to review this new graph exploration technique.

Programming Languages Influence Network

Filed under: Graphics,Graphs,Networks,Visualization — Patrick Durusau @ 4:41 am

Programming Languages Influence Network by Ramiro Gómez.

From the about tab:

This interactive visualization shows a network graph of programming language influences. The graph consists of 1169 programming language nodes and 908 edges that signify an influence relation.

The size of a node is determined by its out degree. The more influential a language is across all languages in the network, the bigger is the corresponding node in the network.

There are several ways of interaction: you can restrict the network to languages within a programming paradigm; you can choose between a Force Atlas 2 and a random graph layout; and you can highlight the connections of a language by moving the mouse over its node. For more details click on the help link in the top menu.

Impressive graphics/visualization!

Suggestive of techniques for other networks of “influence.”

October 12, 2012

PathNet: A tool for pathway analysis using topological information

Filed under: Bioinformatics,Biomedical,Genome,Graphs,Networks — Patrick Durusau @ 3:12 pm

PathNet: A tool for pathway analysis using topological information by Bhaskar Dutta, Anders Wallqvist and Jaques Reifman. (Source Code for Biology and Medicine 2012, 7:10 doi:10.1186/1751-0473-7-10)

Abstract:

Background

Identification of canonical pathways through enrichment of differentially expressed genes in a given pathway is a widely used method for interpreting gene lists generated from highthroughput experimental studies. However, most algorithms treat pathways as sets of genes, disregarding any inter- and intra-pathway connectivity information, and do not provide insights beyond identifying lists of pathways.

Results

We developed an algorithm (PathNet) that utilizes the connectivity information in canonical pathway descriptions to help identify study-relevant pathways and characterize non-obvious dependencies and connections among pathways using gene expression data. PathNet considers both the differential expression of genes and their pathway neighbors to strengthen the evidence that a pathway is implicated in the biological conditions characterizing the experiment. As an adjunct to this analysis, PathNet uses the connectivity of the differentially expressed genes among all pathways to score pathway contextual associations and statistically identify biological relations among pathways. In this study, we used PathNet to identify biologically relevant results in two Alzheimers disease microarray datasets, and compared its performance with existing methods. Importantly, PathNet identified deregulation of the ubiquitin-mediated proteolysis pathway as an important component in Alzheimers disease progression, despite the absence of this pathway in the standard enrichment analyses.

Conclusions

PathNet is a novel method for identifying enrichment and association between canonical pathways in the context of gene expression data. It takes into account topological information present in pathways to reveal biological information. PathNet is available as an R workspace image from http://www.bhsai.org/downloads/pathnet/.

Important work for genomics but also a reminder that a list of paths is just that, a list of paths.

The value-add and creative aspect of data analysis is in the scoring of those paths in order to wring more information from them.

How is it for you? Just lists of paths or something a bit more clever?

Estimating Subject Sameness?

Filed under: Graphs,Networks,Similarity — Patrick Durusau @ 5:26 am

If you think about it, graph isomorphism is a type of subject sameness problem.

Sadly, graph isomorphism remains a research problem so not immediately applicable to problems you encounter with topic maps.

However, Alex Smola in The Weisfeiler-Lehman algorithm and estimation on graphs covers some “cheats” that you may find useful.

Imagine you have two graphs \(G\) and \(G’\) and you’d like to check how similar they are. If all vertices have unique attributes this is quite easy:

FOR ALL vertices \(v \in G \cup G’\) DO

  • Check that \(v \in G\) and that \(v \in G’\)
  • Check that the neighbors of v are the same in \(G\) and \(G’\)

This algorithm can be carried out in linear time in the size of the graph, alas many graphs do not have vertex attributes, let alone unique vertex attributes. In fact, graph isomorphism, i.e. the task of checking whether two graphs are identical, is a hard problem (it is still an open research question how hard it really is). In this case the above algorithm cannot be used since we have no idea which vertices we should match up.

The Weisfeiler-Lehman algorithm is a mechanism for assigning fairly unique attributes efficiently. Note that it isn’t guaranteed to work, as discussed in this paper by Douglas – this would solve the graph isomorphism problem after all. The idea is to assign fingerprints to vertices and their neighborhoods repeatedly. We assume that vertices have an attribute to begin with. If they don’t then simply assign all of them the attribute 1. Each iteration proceeds as follows:

Curious if you find the negative approach, “these two graphs are not isomorphic,” as useful as a positive one (where it works), “these two graphs are isomorphic?”

Or is it sufficient to reliably know that graphs are different?

October 11, 2012

Using (Spring Data) Neo4j for the Hubway Data Challenge [Boston Biking]

Filed under: Challenges,Data,Dataset,Graphs,Neo4j,Networks,Spring — Patrick Durusau @ 12:33 pm

Using (Spring Data) Neo4j for the Hubway Data Challenge by Michael Hunger.

From the post:

Using Spring Data Neo4j it was incredibly easy to model and import the Hubway Challenge dataset into a Neo4j graph database, to make it available for advanced querying and visualization.

The Challenge and Data

Tonight @graphmaven pointed me to the boston.com article about the Hubway Data Challenge.

(graphics omitted)

Hubway is a bike sharing service which is currently expanding worldwide. In the Data challenge they offer the CSV-data of their 95 Boston stations and about half a million bike rides up until the end of September. The challenge is to provide answers to some posted questions and develop great visualizations (or UI’s) for the Hubway data set. The challenge is also supported by MAPC (Metropolitan Area Planning Council).

Useful import tips for data into Neo4j and on modeling this particular dataset.

Not to mention the resulting database as well!

PS: From the challenge site:

Submission will open here on Friday, October 12, 2012.

Deadline

MIDNIGHT (11:59 p.m.) on Halloween,
Wednesday, October 31, 2012.

Winners will be announced on Wednesday, November 7, 2012.

Prizes:

  • A one-year Hubway membership
  • Hubway T-shirt
  • Bern helmet
  • A limited edition Hubway System Map—one of only 61 installed in the original Hubway stations.

For other details, see the challenge site.

October 10, 2012

Interesting large scale dataset: D4D mobile data [Deadline: October 31, 2012]

Filed under: Data,Data Mining,Dataset,Graphs,Networks — Patrick Durusau @ 4:19 pm

Interesting large scale dataset: D4D mobile data by Danny Bickson.

From the post:

I got the following from Prof. Scott Kirkpatrick.

Write a 250-words research project and get access within a week to the largest ever released mobile phone datasets: datasets based on 2.5 billion records, calls and text messages exchanged between 5 million anonymous users over 5 months.

Participation rules: http://www.d4d.orange.com/

Description of the datasets: http://arxiv.org/abs/1210.0137

The “Terms and Conditions” by Orange allows the publication of resultsbobtained from the datasets even if they do not directly relate to the challenge.

Cash prizes for winning participants and an invitation to present the results at the NetMob conference be held in May 2-3, 2013 at the Medialab at MIT (www.netmob.org).

Deadline: October 31, 2012

Looking to exercise your graph software? Compare to other graph software? Do interesting things with cell phone data?

This could be your chance!

October 2, 2012

Graph Drawing talks are online

Filed under: Graphs,Networks,Visualization — Patrick Durusau @ 3:40 pm

Graph Drawing talks are online

From the post:

This year’s graph drawing symposium was located at Microsoft, and thanks to Microsoft the talks are now all online. So if you wanted to go but couldn’t, you can still see what you missed.

Turns out anyone not there, missed a lot!

One of the jewels I am watching right now is Ben Shneiderman and Cody Dunne.

Suggest “graph drawing” should be renamed “graph discovery.”

Paraphrase: “the purpose of graph drawing is not pictures but discovery.”

« Newer PostsOlder Posts »

Powered by WordPress