Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 20, 2013

DELTACON: A Principled Massive-Graph Similarity Function

Filed under: Graphs,Networks,Similarity — Patrick Durusau @ 10:24 am

DELTACON: A Principled Massive-Graph Similarity Function by Danai Koutra, Joshua T. Vogelstein, Christos Faloutsos.

Abstract:

How much did a network change since yesterday? How different is the wiring between Bob’s brain (a left-handed male) and Alice’s brain (a right-handed female)? Graph similarity with known node correspondence, i.e. the detection of changes in the connectivity of graphs, arises in numerous settings. In this work, we formally state the axioms and desired properties of the graph similarity functions, and evaluate when state-of-the-art methods fail to detect crucial connectivity changes in graphs. We propose DeltaCon, a principled, intuitive, and scalable algorithm that assesses the similarity between two graphs on the same nodes (e.g. employees of a company, customers of a mobile carrier). Experiments on various synthetic and real graphs showcase the advantages of our method over existing similarity measures. Finally, we employ DeltaCon to real applications: (a) we classify people to groups of high and low creativity based on their brain connectivity graphs, and (b) do temporal anomaly detection in the who-emails-whom Enron graph.

How different is your current topic map from a prior version?

Could be an interesting marketing ploy to colorize the distinct portions of the graph.

Not to mention using “similarity” to mean the same subject for some purposes. Group subjects come to mind.

And for other types of analysis.

April 18, 2013

Fast Collaborative Graph Exploration

Filed under: Collaboration,Graph Analytics,Graphs,Networks — Patrick Durusau @ 2:26 pm

Fast Collaborative Graph Exploration by Dariusz Dereniowski, Yann Disser, Adrian Kosowski, Dominik Pajak, Przemyslaw Uznanski.

Abstract:

We study the following scenario of online graph exploration. A team of $k$ agents is initially located at a distinguished vertex $r$ of an undirected graph. At every time step, each agent can traverse an edge of the graph. All vertices have unique identifiers, and upon entering a vertex, an agent obtains the list of identifiers of all its neighbors. We ask how many time steps are required to complete exploration, i.e., to make sure that every vertex has been visited by some agent. We consider two communication models: one in which all agents have global knowledge of the state of the exploration, and one in which agents may only exchange information when simultaneously located at the same vertex. As our main result, we provide the first strategy which performs exploration of a graph with $n$ vertices at a distance of at most $D$ from $r$ in time $O(D)$, using a team of agents of polynomial size $k = D n^{1+ \epsilon} < n^{2+\epsilon}$, for any $\epsilon > 0$. Our strategy works in the local communication model, without knowledge of global parameters such as $n$ or $D$. We also obtain almost-tight bounds on the asymptotic relation between exploration time and team size, for large $k$. For any constant $c>1$, we show that in the global communication model, a team of $k = D n^c$ agents can always complete exploration in $D(1+ \frac{1}{c-1} +o(1))$ time steps, whereas at least $D(1+ \frac{1}{c} -o(1))$ steps are sometimes required. In the local communication model, $D(1+ \frac{2}{c-1} +o(1))$ steps always suffice to complete exploration, and at least $D(1+ \frac{2}{c} -o(1))$ steps are sometimes required. This shows a clear separation between the global and local communication models.

Heavy going but seems important for graph exploration performance.

See also the special case of exploring trees under related work.

Another possibility for exploring overlapping markup. Each agent has an independent view of one part of the markup trees.

April 17, 2013

A New Perspective on Vertex Connectivity

Filed under: Graphs,Networks — Patrick Durusau @ 3:12 pm

A New Perspective on Vertex Connectivity by Keren Censor-Hillel, Mohsen Ghaffari, Fabian Kuhn.

Abstract:

Edge connectivity and vertex connectivity are two fundamental concepts in graph theory. Although by now there is a good understanding of the structure of graphs based on their edge connectivity, our knowledge in the case of vertex connectivity is much more limited. An essential tool in capturing edge connectivity are edge-disjoint spanning trees. The famous results of Tutte and Nash-Williams show that a graph with edge connectivity $\lambda$ contains $\floor{\lambda/2}$ edge-disjoint spanning trees. We present connected dominating set (CDS) partition and packing as tools that are analogous to edge-disjoint spanning trees and that help us to better grasp the structure of graphs based on their vertex connectivity. The objective of the CDS partition problem is to partition the nodes of a graph into as many connected dominating sets as possible. The CDS packing problem is the corresponding fractional relaxation, where CDSs are allowed to overlap as long as this is compensated by assigning appropriate weights. CDS partition and CDS packing can be viewed as the counterparts of the well-studied edge-disjoint spanning trees, focusing on vertex disjointedness rather than edge disjointness.

We constructively show that every $k$-vertex-connected graph with $n$ nodes has a CDS packing of size $\Omega(k/\log n)$ and a CDS partition of size $\Omega(k/\log^5 n)$. We prove that the $\Omega(k/\log n)$ CDS packing bound is existentially optimal.

Using CDS packing, we show that if vertices of a $k$-vertex-connected graph are independently sampled with probability $p$, then the graph induced by the sampled vertices has vertex connectivity $\tilde{\Omega}(kp^2)$. Moreover, using our $\Omega(k/\log n)$ CDS packing, we get a store-and-forward broadcast algorithm with optimal throughput in the networking model where in each round, each node can send one bounded-size message to all its neighbors.

Just in case you are interested in cutting edge (sorry) graph research.

Users can assure each other they are using the most popular graph software or they can be using the most powerful graph software.

I know which one I would choose.

How about you?

April 7, 2013

Deploying Graph Algorithms on GPUs: an Adaptive Solution

Filed under: Algorithms,GPU,Graphs,Networks — Patrick Durusau @ 5:46 am

Deploying Graph Algorithms on GPUs: an Adaptive Solution by Da Li and Michela Becchi. (27th IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2013)

From the post:

Thanks to their massive computational power and their SIMT computational model, Graphics Processing Units (GPUs) have been successfully used to accelerate a wide variety of regular applications (linear algebra, stencil computations, image processing and bioinformatics algorithms, among others). However, many established and emerging problems are based on irregular data structures, such as graphs. Examples can be drawn from different application domains: networking, social networking, machine learning, electrical circuit modeling, discrete event simulation, compilers, and computational sciences. It has been shown that irregular applications based on large graphs do exhibit runtime parallelism; moreover, the amount of available parallelism tends to increase with the size of the datasets. In this work, we explore an implementation space for deploying a variety of graph algorithms on GPUs. We show that the dynamic nature of the parallelism that can be extracted from graph algorithms makes it impossible to find an optimal solution. We propose a runtime system able to dynamically transition between different implementations with minimal overhead, and investigate heuristic decisions applicable across algorithms and datasets. Our evaluation is performed on two graph algorithms: breadth-first search and single-source shortest paths. We believe that our proposed mechanisms can be extended and applied to other graph algorithms that exhibit similar computational patterns.

A development that may surprise some graph software vendors, there are “no optimal solution[s] across graph problems and datasets” for graph algorithms on GPU.

This paper points towards an adaptive technique that may prove to be “resilient to the irregularity and heterogeneity of real world graphs.”

I first saw this in a tweet by Stefano Bertolo.

April 3, 2013

2nd GraphLab workshop [Early Bird Registration]

Filed under: Conferences,GraphLab,Graphs,Networks — Patrick Durusau @ 10:14 am

The 2nd GraphLab workshop is coming up! by Danny Bickson.

Danny also says there is a 30% discount if you email him: danny.bickson@gmail.com. Don’t know when that runs out but worth a try.

From the post:

Following the great success of the first GraphLab workshop, we have started to organize this year event, in July at the bay area. To remind you, last year we wanted to organize a 15-20 people event, which eventually got a participation of 300+ researchers from 100+ companies.

The main aim of this year workshop is to bring together top researchers from academia, as well as top data scientists from industry with the special focus of large scale machine learning on sparse graphs.

The event will take place Monday July 1st, 2013 in San Francisco. Early bird registration is now open!

Preliminary agenda.

Definitely one to have on your calendar!

March 31, 2013

Embedding Pubmed, Graphviz and a remote image in #LaTeX

Filed under: Graphs,Networks,TeX/LaTeX — Patrick Durusau @ 1:08 pm

Embedding Pubmed, Graphviz and a remote image in #LaTeX by Pierre Lindenbaum.

Pierre demonstrates how to use:

\newcommand{name}[num]{definition}

to load a remote picture, a graphviz result and retrieving a PubMed record for embedding in a LaTeX document.

From the LaTeX Macro page Pierre cites:

The num argument in square brackets is optional and specifies the number of arguments the new command takes (up to 9 are possible). If missing it defaults to 0, i.e. no argument allowed.

I caught myself wondering about that argument.

The graphviz command looks particularly interesting for topic map illustrations.

March 30, 2013

Permission Resolution with Neo4j – Part 1

Filed under: Cybersecurity,Graphs,Neo4j,Networks,Security — Patrick Durusau @ 2:17 pm

Permission Resolution with Neo4j – Part 1 by Max De Marzi.

From the post:

People produce a lot of content. Messages, text files, spreadsheets, presentations, reports, financials, etc, the list goes on. Usually organizations want to have a repository of all this content centralized somewhere (just in case a laptop breaks, gets lost or stolen for example). This leads to some kind of grouping and permission structure. You don’t want employees seeing each other’s HR records, unless they work for HR, same for Payroll, or unreleased quarterly numbers, etc. As this data grows it no longer becomes easy to simply navigate and a search engine is required to make sense of it all.

But what if your search engine returns 1000 results for a query and the user doing the search is supposed to only have access to see 4 things? How do you handle this? Check the user permissions on each file realtime? Slow. Pre-calculate all document permissions for a user on login? Slow and what if new documents are created or permissions change between logins? Does the system scale at 1M documents, 10M documents, 100M documents?

Search is one example of a need to restrict viewing results but browsing raises the same issues. Or display of information along side other information.

As I recall, Netware 4.1 (other versions as well no doubt) had the capability for a sysadmin to create sub-sysadmins, say for accounting or HR, that could hide information from the sysadmin. That was prior to search being commonly available.

What other security for search result schemes are out there?

March 29, 2013

Titan 0.3.0 Released

Filed under: Graphs,Networks,Titan — Patrick Durusau @ 3:38 pm

Titan 0.3.0 Released

From the webpage:

Titan 0.3.0 has been released and is ready for download. This release provides a complete performance-driven redesign of many core components. Furthermore, the primary outward facing feature is advanced indexing. The new indexing features are itemized below:

  • Geo: Search for elements using shape primitives within a 2D plane.
  • Full-text: Search elements for matching string and text properties.
  • Numeric range: Search for elements with numeric property values using intervals.
  • Edge: Edges can be indexed as well as vertices.

The Titan tutorial demonstrates the new capabilities.

This should keep you busy over the weekend!

March 26, 2013

MapEquation.org

Filed under: Graphics,Graphs,Mapping,Mathematics,Networks,Visualization — Patrick Durusau @ 1:30 pm

MapEquation.org by Daniel Edler and Martin Rosvall.

From the “about” page:

What do we do?

We develop mathematics, algorithms and software to simplify and highlight important structures in complex systems.

What are our goals?

To navigate and understand big data like we navigate and understand the real world by maps.

Suggest you start with the Apps.

Very impressive and has data available for loading.

You can also upload your own data.

Spend some time with Code and Publications as well.

I first saw this in a tweet by Chris@SocialTexture.

March 25, 2013

Under the Hood: Building out the infrastructure for Graph Search

Filed under: Facebook,Graphs,Networks — Patrick Durusau @ 10:32 am

Under the Hood: Building out the infrastructure for Graph Search by Sriram Sankar, Soren Lassen, and Mike Curtiss.

From the post:

In the early days, Facebook was as much about meeting new people as keeping in touch with people you already knew at your college. Over time, Facebook became more about maintaining connections. Graph Search takes us back to our roots and helps people make new connections–this time with people, places, and interests.

With this history comes several old search systems that we had to unify in order to build Graph Search. At first, the old search on Facebook (called PPS) was keyword based–the searcher entered keywords and the search engine produced a results page that was personalized and could be filtered to focus on specific kinds of entities such as people, pages, places, groups, etc.

Entertaining overview of the development of the graph solution for Facebook.

Moreover, reassurance if you are worried about “scaling” for your graph application. 😉

I first saw this at: This Week’s Links by Trevor Landau.

March 23, 2013

Graph Processing DevRoom 2013 edition

Filed under: Graphs,Networks — Patrick Durusau @ 12:56 pm

Graph Processing DevRoom 2013 edition

Twelve slide decks from the Graph Processing workshop within FOSDEM.

Enjoy!

March 19, 2013

BitcoinVisualizer

Filed under: Graphics,Networks,Visualization — Patrick Durusau @ 5:55 am

BitcoinVisualizer by John Russell.

From the webpage:

Block Viewer visualizes the Bitcoin block chain by building an ownership network on top of the underlying transaction network and presents a web-enabled user interface to display the visualization results.

Great mapping exercise!

Imagine what could be done tracking all banking transfers.

Before you object that banking transfer monitoring would require a search warrant, remember that Richard Nixon could not be prosecuted for treason because the evidence was the result of an illegal wiretap.

Take this mapping as a reminder to use cash whenever possible.

Demo: http://www.blockviewer.com/#30203900

I first saw this in a tweet by Max De Marzi.

March 18, 2013

Permission Resolution With Neo4j – Part 1

Filed under: Graphs,Neo4j,Networks,Security — Patrick Durusau @ 4:32 pm

Permission Resolution With Neo4j – Part 1 by Max De Marzi.

From the post:

People produce a lot of content. Messages, text files, spreadsheets, presentations, reports, financials, etc, the list goes on. Usually organizations want to have a repository of all this content centralized somewhere (just in case a laptop breaks, gets lost or stolen for example). This leads to some kind of grouping and permission structure. You don’t want employees seeing each other’s HR records, unless they work for HR, same for Payroll, or unreleased quarterly numbers, etc. As this data grows it no longer becomes easy to simply navigate and a search engine is required to make sense of it all.

But what if your search engine returns 1000 results for a query and the user doing the search is supposed to only have access to see 4 things? How do you handle this? Check the user permissions on each file realtime? Slow. Pre-calculate all document permissions for a user on login? Slow and what if new documents are created or permissions change between logins? Does the system scale at 1M documents, 10M documents, 100M documents?

Max addresses the scaling issue by checking only the results from a search. So to that extent, the side of the document store becomes irrelevant.

At least if you have a smallish number of results from the search.

I haven’t seen part 2 but another scale tactic would be to limit access to indexes by permissions. Segregating human resources, accounting, etc.

Looking forward to where Max takes this one.

March 17, 2013

Linkurious [free beta]

Filed under: Graphics,Graphs,Networks,Visualization — Patrick Durusau @ 5:36 am

Linkurious

From the homepage:

CONNECT

Our Open Source backend indexes your graph so you can connect with it on Linkurious and get started in minutes. When it is done, launch the web application of Linkurious.

SEARCH

Typing any keyword in the search bar brings up all the related data in one step. We provide a console for advanced queries so you can be as broad or as specific as you want.

EXPLORE

By focusing on the items related to your search, visualizing and exploring your graph has never been easier. Dig further in any direction using the connected nodes and make sense of your data.

A couple of other resources:

How it works, and

Graph Visualization options and latest developments

will be of interest.

I haven’t signed up, yet, but the slides make a good point that what graph visualization you need depends, unsurprisingly, on your use case.

I first saw this in a tweet by David W. Allen.

March 13, 2013

JSNetworkX

Filed under: D3,Graphs,Javascript,Networks,NetworkX — Patrick Durusau @ 12:39 pm

JSNetworkX

A port of the NetworkX graph library to JavaScript

SNetworkX is a port of the popular Python graph library NetworkX. Lets describe it with their words:

NetworkX is a Python language software package for the creation, manipulation, and study of the structure, dynamics, and function of complex networks.

With NetworkX you can load and store networks in standard and nonstandard data formats, generate many types of random and classic networks, analyze network structure, build network models, design new network algorithms, draw networks, and much more.

Github.

Wiki.

Looks like an easy way to include graph representations of topic maps in a web page.

I suspect you will be seeing more of this in the not too distant future.

I first saw this in a tweet by Christophe Viau.

Inferring Social Rank in…

Filed under: Networks,Probalistic Models,Social Networks — Patrick Durusau @ 4:05 am

Inferring Social Rank in an Old Assyrian Trade Network by David Bamman, Adam Anderson, Noah A. Smith.

Abstract:

We present work in jointly inferring the unique individuals as well as their social rank within a collection of letters from an Old Assyrian trade colony in K\”ultepe, Turkey, settled by merchants from the ancient city of Assur for approximately 200 years between 1950-1750 BCE, the height of the Middle Bronze Age. Using a probabilistic latent-variable model, we leverage pairwise social differences between names in cuneiform tablets to infer a single underlying social order that best explains the data we observe. Evaluating our output with published judgments by domain experts suggests that our method may be used for building informed hypotheses that are driven by data, and that may offer promising avenues for directed research by Assyriologists.

An example of how digitization of ancient texts enables research other than text searching.

Inferring identity and social rank may be instructive for creation of topic maps from both ancient and modern data sources.

I first saw this in a tweet by Stefano Bertolo.

March 11, 2013

GraphLab: A Distributed Abstraction…

Filed under: Graph Partitioning,GraphLab,Graphs,Networks — Patrick Durusau @ 7:32 pm

GraphLab: A Distributed Abstraction for Machine Learning in the Cloud by Carlos Guestrin. (video)

Take away line: “How does a vertex think?”

Deeply impressive presentation and performance numbers!

Resources:

GraphLab 2.1: http://graphlab.org

GraphChi 0.1: http://graphchi.org

Slides from the talk.

This needs to be very high on your graph reading/following list.

I first saw this at: SFBayACM talk: GraphLab framework for Machine Learning in the Cloud.

March 9, 2013

NetflixGraph

Filed under: Graphs,NetflixGraph,Networks — Patrick Durusau @ 11:57 am

NetflixGraph: Compact in-memory representation of directed graph data by Drew Koszewnik.

From the post:

Your memory footprint just shrank

NetflixGraph is a compact in-memory data structure used to represent directed graph data. You can use NetflixGraph to vastly reduce the size of your application’s memory footprint, potentially by an order of magnitude or more. If your application is I/O bound, you may be able to remove that bottleneck by holding your entire dataset in RAM. You’ll likely be very surprised by how little memory is actually required to represent your data.

NetflixGraph provides an API to translate your data into a graph format, compress that data in memory, then serialize the compressed in-memory representation of the data so that it may be easily transported across your infrastructure.

Definitely a high priority for the coming weekend!

Graph Partitioning and Expanders (April 2013)

Filed under: Graph Partitioning,Graphs,Networks — Patrick Durusau @ 11:56 am

Graph Partitioning and Expanders by Professor Luca Trevisan.

From the description:

In this research-oriented graduate course, we will study algorithms for graph partitioning and clustering, constructions of expander graphs, and analysis of random walks. These are three topics that build on the same mathematical background and that have several important connections: for example it is possible to find graph clusters via random walks, and it is possible to use the linear programming approach to graph partitioning as a way to study random walks.

We will study spectral graph theory, which explains how certain combinatorial properties of graphs are related to the eigenvalues and eigenvectors of the adjacency matrix, and we will use it describe and analyze spectral algorithms for graph partitioning and clustering. Spectral graph theory will recur as an important tool in the rest of the course. We we will also discuss other approaches to graph partitioning via linear programming and semidefinite programming. Then we will study constructions of expander graphs, which are graphs with very strong pseudorandomness properties, which are useful in many applications, including in cryptography, in complexity theory, in algorithms and data structures, and in coding theory. Finally, we will study the mixing time of random walks, a problem that comes up in several applications, including the analysis of the convergence time of certain randomized algorithms, such as the Metropolis algorithm.

Workload

about 8 hours per week

Prerequisites

linear algebra, discrete probability, and algorithms

The Instructor

Luca Trevisan is a professor of computer science at Stanford University. Before joining Stanford in 2010, Luca taught at Columbia University and at the University of California, Berkeley.

Luca’s research is in theoretical computer science, and he has worked on average-case complexity theory, pseudorandomness and derandomization, hardness of approximation, probabilistically checkable proofs, and approximation algorithms. In the past three years he has been working on spectral graph theory and its applications to graph algorithmns.

Luca received the STOC’97 Danny Lewin award, the 2000 Oberwolfach Prize, and the 2000 Sloan Fellowship. He was an invited speaker at the 2006 International Congress of Mathematicians in Madrid.

Not for the faint of heart!

But on the other hand, if you want to be on the cutting edge of graph development….

March 8, 2013

neo4j: Make properties relationships [Associations As First Class Citizens?]

Filed under: Graphs,Neo4j,Networks — Patrick Durusau @ 2:43 pm

neo4j: Make properties relationships by Mark Needham.

From the post:

I spent some of the weekend working my way through Jim, Ian & Emil‘s book ‘Graph Databases‘ and one of the things that they emphasise is that graphs allow us to make relationships first class citizens in our model.

Looking back on a couple of the graphs that I modelled last year I realise that I didn’t quite get this and although the graphs I modelled had some relationships a lot of the time I was defining things as properties on nodes.

While it’s fine to do this I think we lose some of the power of a graph and it’s not necessarily obvious what we’ve lost until we model a property as a relationship and see what possibilities open up.

For example in my football graph I wanted to record the date of matches and initially stored this as a property on the match before realising that modelling it as a relationship which might open up some interesting queries.

Reading Mark’s post illustrates the power of using associations to model “properties” in topic maps.

In Neo4j, relationships are first class citizens.

Unfortunately, we can’t say the same for associations in topic maps.

You may recall that associations in a topic map are restricted in the information they can carry.

If you want to add a name to an association, for example, you have to reify the association with a topic. Which means you have the association and a topic for the association, representing the same subject.

Not to mention a lot of machinery overhead for something fairly simple.

I am aware that the TMDM and XTM were fashioned to follow the original version of ISO 13250. The origin of reification in topic maps.

However, simply because all buggies had whips at one point is no reason to design cars with whip holders.

The time has come to revisit reification and in my view, revise both the TMDM and XTM to remove it.

And to make associations and occurrences first class citizens in both the TMDM and XTM.

Comments/suggestions?

March 6, 2013

Social Graphs and Applied NoSQL Solutions [Merging Graphs?]

Filed under: Graphs,Networks,Social Graphs,Social Networks — Patrick Durusau @ 11:20 am

Social Graphs and Applied NoSQL Solutions by John L. Myers.

From the post:

Recent postings have been more about the “theory” behind the wonderful world of NoSQL and less about how to implement a solution with a NoSQL platform. Well it’s about time that I changed that. This posting will be about how the graph structure and graph databases in particular can be an excellent “applied solution” of NoSQL technologies.

When Facebook released its Graph Search, the general public finally got a look at what the “backend” of Facebook looked like or its possible uses … For many the consumer to consumer (c2c) version of Facebook’s long available business-to-business and business-to-consumer offerings was a bit more of the “creepy” vs. the “cool” of the social media content. However, I think it will have the impact of opening people’s eyes on how their content can and probably should be used for search and other analytical purposes.

With graph structures, unlike tabular structures such as row and column data schemas, you look at the relationships between the nodes (i.e. customers, products, locations, etc.) as opposed to looking at the attributes of a particular object. For someone like me, who has long advocated that we should look at how people, places and things interact with each other versus how their “demographics” (i.e. size, shape, income, etc.) make us “guess” how they interact with each other. In my opinion, demographics and now firmographics have been used as “substitutes” for how people and organizations behave. While this can be effective in the aggregate, as we move toward a “bucket of one” treatment model for customers or clients, for instance, we need to move away from using demographics/firmographics as a primary analysis tool.

Let’s say that graph databases become as popular as SQL databases. You can’t scratch an enterprise without finding a graph database.

And they are all as different from each other as the typical SQL database is today.

How do you go about merging graph databases?

Can you merge a graph database and retain the characteristics of the graph databases separately?

If graph databases become as popular as they should, those are going to be real questions in the not too distant future.

March 5, 2013

Rebuilding Gephi’s core for the 0.9 version

Filed under: Gephi,Graphs,Networks,Visualization — Patrick Durusau @ 2:07 pm

Rebuilding Gephi’s core for the 0.9 version by Mathieu Bastian.

From the post:

This is the first article about the future Gephi 0.9 version. Our objective is to prepare the ground for a future 1.0 release and focus on solving some of the most difficult problems. It all starts with the core of Gephi and we’re giving today a preview of the upcoming changes in that area. In fact, we’re rewriting the core modules from scratch to improve performance, stability and add new features. The core modules represent and store the graph and attributes in memory so it’s available to the rest of the application. Rewriting Gephi’s core is like replacing the engine of a truck and involves adapting a lot of interconnected pieces. Gephi’s current graph structure engine was designed in 2009 and didn’t change much in multiple releases. Although it’s working, it doesn’t have the level of quality we want for Gephi 1.0 and needs to be overhauled. The aim is to complete the new implementation and integrate it in the 0.9 version.

Deeply interesting work!

To follow, consider subscribing to: gephi-dev — List for core developers.

“Do Bees” / “Don’t Bees” and Neo4j

Filed under: Cypher,Graphs,Neo4j,Networks — Patrick Durusau @ 1:12 pm

According to Michael Hunger in a Neo4j Google Groups message, the Neo4j team is drowning in its own success!

Now there’s a problem to have!

“Do Bees” for Neo4j will:

…ask questions on Stack Overflow that related to:

Please tag your questions with “neo4j” and “cypher”, “gremlin” or “spring data neo4j” accordingly. See the current list:

http://stackoverflow.com/questions/tagged/neo4j

Currently questions on SO are answered quickly by a group of very active people which we hope you will join. We try to chime in as often as possible (especially with unanswered questions).

So PLEASE post your questions there on Stack Overflow, we will start asking individuals to move their questions to that platform and if they don’t manage it, move them ourselves.

We will also monitor this badge: http://stackoverflow.com/badges/1785/neo4j and award cool stuff for people that make it there.

This google group shall return to its initial goals of having broader discussions about graph topics, modeling, architectures, roadmap, announcements, cypher evolution, open source etc. So we would love everyone who has questions or problems in these areas to reach out and start a conversation.

Hope for your understanding to make more breathing room in this group and more interesting discussions in the future while keeping an interactive FAQ around Neo4j going on SO with quick feedback loops and turnaround times.

The Neo4j community will be healthier if we are all “Do Bees” so I won’t cover the alternative.

If you don’t know “Do Bees” / “Don’t Bees,” see: Romper Room.

See you at Stackoverflow!

March 4, 2013

Pattern Based Graph Generator

Filed under: Graph Databases,Graph Generator,Graphs,Networks — Patrick Durusau @ 5:35 pm

Pattern Based Graph Generator by Hong-Han Shuai, De-Nian Yang, Philip S. Yu, Chih-Ya Shen, Ming-Syan Chen.

Abstract:

The importance of graph mining has been widely recognized thanks to a large variety of applications in many areas, while real datasets always play important roles to examine the solution quality and efficiency of a graph mining algorithm. Nevertheless, the size of a real dataset is usually fixed and constrained according to the available resources, such as the efforts to crawl an on-line social network. In this case, employing a synthetic graph generator is a possible way to generate a massive graph (e.g., billions nodes) for evaluating the scalability of an algorithm, and current popular statistical graph generators are properly designed to maintain statistical metrics such as total node degree, degree distribution, diameter, and clustering coefficient of the original social graphs. Nevertheless, in addition to the above metrics, recent studies on graph mining point out that graph frequent patterns are also important to provide useful implications for the corresponding social networking applications, but this crucial criterion has not been noticed in the existing graph generators. This paper first manifests that numerous graph patterns, especially large patterns that are crucial with important domain-specific semantic, unfortunately disappear in the graphs created by popular statistic graph generators, even though those graphs enjoy the same statistical metrics with the original real dataset. To address this important need, we make the first attempt to propose a pattern based graph generator (PBGG) to generate a graph including all patterns and satisfy the user-specified parameters on supports, network size, degree distribution, and clustering coefficient. Experimental results show that PBGG is efficient and able to generate a billion-node graph with about 10 minutes, and PBGG is released for free download.

Download at: http://arbor.ee.ntu.edu.tw/~hhshuai/PBGG.html.

OK, this is not a “moderate size database” program.

GraphBuilder – A Scalable Graph Construction Library for Apache™ Hadoop™

Filed under: GraphBuilder,Graphs,Hadoop,MapReduce,Networks — Patrick Durusau @ 2:56 pm

GraphBuilder – A Scalable Graph Construction Library for Apache™ Hadoop™ by Theodore L. Willke, Nilesh Jain and Haijie Gu. (whitepaper)

Abstract:

The exponential growth in the pursuit of knowledge gleaned from data relationships that are expressed naturally as large and complex graphs is fueling new parallel machine learning algorithms. The nature of these computations is iterative and data-dependent. Recently, frameworks have emerged to perform these computations in a distributed manner at commercial scale. But feeding data to these frameworks is a huge challenge in itself. Since graph construction is a data-parallel problem, Hadoop is well-suited for this task but lacks some elements that would make things easier for data scientists that do not have domain expertise in distributed systems engineering. We developed GraphBuilder, a scalable graph construction software library for Apache Hadoop, to address this gap. GraphBuilder offloads many of the complexities of graph construction, including graph formation, tabulation, compression, transformation, partitioning, output formatting, and serialization. It is written in Java for ease of programming and scales using the MapReduce parallel programming model. We describe the motivation for GraphBuilder, its architecture, and present two case studies that provide a preliminary evaluation.

The “whitepaper” introduction to GraphBuilder.

A New Representation of WordNet® using Graph Databases

Filed under: Graph Databases,Graphs,Neo4j,Networks,WordNet — Patrick Durusau @ 10:46 am

A New Representation of WordNet® using Graph Databases by Khaled Nagi.

Abstract:

WordNet® is one of the most important resources in computation linguistics. The semantically related database of English terms is widely used in text analysis and retrieval domains, which constitute typical features, employed by social networks and other modern Web 2.0 applications. Under the hood, WordNet® can be seen as a sort of read-only social network relating its language terms. In our work, we implement a new storage technique for WordNet® based on graph databases. Graph databases are a major pillar of the NoSQL movement with lots of emerging products, such as Neo4j. In this paper, we present two Neo4j graph storage representations for the WordNet® dictionary. We analyze their performance and compare them to other traditional storage models. With this contribution, we also validate the applicability of modern graph databases in new areas beside the typical large-scale social networks with several hundreds of millions of nodes.

Finally, a paper that covers “moderate size databases!”

Think about the average graph database you see on this blog. Not really in the “moderate” range, even though a majority of users work in the moderate range.

Compare the number of Facebook size enterprises with the number of enterprises generally.

Not dissing super-sized graph databases or research on same. I enjoy both a lot.

But for your average customer, experience with “moderate size databases” may be more immediately relevant.

I first saw this in a tweet from Peter Neubauer.

March 3, 2013

A Fast Parallel Maximum Clique Algorithm…

Filed under: Dynamic Graphs,Graphs,Networks,Sparse Data — Patrick Durusau @ 4:01 pm

A Fast Parallel Maximum Clique Algorithm for Large Sparse Graphs and Temporal Strong Components by Ryan A. Rossi, David F. Gleich, Assefaw H. Gebremedhin, Md. Mostofa Ali Patwary.

Abstract:

We propose a fast, parallel, maximum clique algorithm for large, sparse graphs that is designed to exploit characteristics of social and information networks. We observe roughly linear runtime scaling over graphs between 1000 vertices and 100M vertices. In a test with a 1.8 billion-edge social network, the algorithm finds the largest clique in about 20 minutes. For social networks, in particular, we found that using the core number of a vertex in combination with a good heuristic clique finder efficiently removes the vast majority of the search space. In addition, we parallelize the exploration of the search tree. In the algorithm, processes immediately communicate changes to upper and lower bounds on the size of maximum clique, which occasionally results in a super-linear speedup because vertices with especially large search spaces can be pruned by other processes. We use this clique finder to investigate the size of the largest temporal strong components in dynamic networks, which requires finding the largest clique in a particular temporal reachability graph.

Thirty-two networks are reported in this paper and a promised online appendix as around eighty (80).

The online appendix is live but as of today (March 2, 2013), it has no content.

No matter, the paper should keep you busy for more than a little while. 😉

I am interested in parallel graph processing in general but the concept of communicating “…changes to upper and lower bounds on the size of maximum clique…” seems applicable to “merging” in topic maps.

That is if some set of topics share some common characteristic that exclude them from consideration for merging, why apply the merging test at all?

Will have to think about that.

Graph Based Recommendations using “How-To” Guides Dataset

Filed under: Graphs,Networks,Recommendation — Patrick Durusau @ 1:58 pm

Graph Based Recommendations using “How-To” Guides Dataset by Marcel Caraciolo.

From the post:

In this post I’d like to introduce another approach for recommender engines using graph concepts to recommend novel and interesting items. I will build a graph-based how-to tutorials recommender engine using the data available on the website SnapGuide (By the way I am a huge fan and user of this tutorials website), the graph database Neo4J and the graph traversal language Gremlin.

What is SnapGuide ?

Snapguide is a web service for anyone who wants to create and share step-by-step “how to guides”. It is available on the web and IOS app. There you can find several tutorials with easy visual instructions for a wide array of topics including cooking, gardening, crafts, projects, fashion tips and more. It is free and anyone is invitide to submit guides in order to share their passions and expertise with the community. I have extracted from their website for only research purposes the corpus of tutorials likes. Several users may like the tutorial and this signal can be quite useful to recommend similar tutorials based on what other users liked. Unfortunately I can’t provide the dataset for download but the code you can follow below for your own data set.

An excellent tutorial that walks you through the creation of graph based recommendations, from acquiring the data to posting queries to it.

The SnapGuide site looks like another opportunity for topic map related tutorial material.

GraphBuilder

Filed under: Graphs,Networks — Patrick Durusau @ 1:45 pm

GraphBuilder: Large-Scale Graph Construction using Apache™ Hadoop™

From the project page:

GraphBuilder is a Java library for constructing graphs out of large datasets for data analytics and structured machine learning applications that exploit relationships in data. The library offloads many of the complexities of graph construction, such as graph formation, tabulation, compression, transformation, partitioning, output formatting, and serialization. It scales using the MapReduce parallel programming model. The major components of GraphBuilder library, and its relation to Hadoop MapReduce, are shown below.

You may remember my post about the original release of this library in: Building graphs with Hadoop.

The GraphBuilder mailing list archives don’t show a lot of traffic, yet, so it may be easy to get noticed.

February 23, 2013

Social Network Analysis [Coursera – March 4, 2013]

Filed under: Networks,Social Networks — Patrick Durusau @ 5:40 am

Social Network Analysis by Lada Adamic (University of Michigan)

Description:

Everything is connected: people, information, events and places, all the more so with the advent of online social media. A practical way of making sense of the tangle of connections is to analyze them as networks. In this course you will learn about the structure and evolution of networks, drawing on knowledge from disciplines as diverse as sociology, mathematics, computer science, economics, and physics. Online interactive demonstrations and hands-on analysis of real-world data sets will focus on a range of tasks: from identifying important nodes in the network, to detecting communities, to tracing information diffusion and opinion formation.

The item on the syllabus that caught my eye:

Ahn et al., and Teng et al.: Learning about cooking from ingredient and flavor networks

On which see:

Flavor network and the principles of food pairing, Yong-Yeol Ahn, Sebastian E. Ahnert, James P. Bagrow & Albert-László Barabási.

or

Recipe recommendation using ingredient networks, Chun-Yuen Teng, Yu-Ru Lin, Lada A. Adamic,

Heavier on the technical side than Julia Child reruns but enjoyable none the less.

« Newer PostsOlder Posts »

Powered by WordPress