Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 7, 2013

The first GraphGist Challenge completed

Filed under: Graphs,Neo4j — Patrick Durusau @ 9:48 am

The first GraphGist Challenge completed by Anders Nawroth.

From the post:

We’re happy to announce the results of the first GraphGist challenge.

First of all, we want to thank all participants for their great contributions. We were blown away by the high quality of the contributions. Everyone has put in a lot of time and effort, providing thoughtful, interesting and well explained data models and Cypher queries. There was also great use of graphics, including use of the Arrows tool.

We thought we had high expectations, but the contributions still exceeded them by far. In this sense, everyone is a winner, and we look forward to sending out a cool Neo4j t-shirt and Graph Connect ticket or a copy of the Graph Databases book to all participants. And for the same reason, we strongly advice you to go have a look at all submissions.

The winners:

At third place, we find Chess Games and Positions by Wes Freeman. He makes it all sound very simple:

Learning Graph by Johannes Mockenhaupt comes in at second place. Here’s his own introduction to it:

The US Flights & Airports contribution from Nicole White finished first in this challenge. Congrats Nicole!

….

The near future:

If you want to have a look at the GraphGist project, it’s located here: https://github.com/neo4j-contrib/graphgist. It’s a client-side only browser-based application. Meaning, it’s basically a bunch of Javascript files. We’d be happy to see Pull Requests for the project. Please note that you can contribute styling or documentation (as a GraphGist), not only Javascript code!

We already got questions about the next GraphGist challenge. Our plan is to run the next challenge around the time Neo4j 2.0 gets released. Currently we think that will mean a closing date before Christmas. We’ll keep you posted when we know more.

Anders provides great descriptions of the winners but see their entries for full details.

For that matter, see all the entries. The breath of applications may surprise you.

Even if not, it will be good preparation for the next GraphGist challenge!

October 4, 2013

cypher-mode

Filed under: Cypher,Graphs,Neo4j — Patrick Durusau @ 4:42 pm

cypher-mode

From the webpage:

Emacs major mode for editing cypher scripts (Neo4j).

First *.el upload today. Could be interesting.

DocGraph – Neo4j Version

Filed under: Graphs,Neo4j — Patrick Durusau @ 4:01 pm

DocGraph – Neo4j Version by Max De Marzi.

Max has created a Neo4j version of the DocGraph dataset.

Enjoy!

Towards OLAP in Graph Databases (MSc. Thesis)

Filed under: Graphs,Neo4j — Patrick Durusau @ 1:40 pm

Towards OLAP in Graph Databases (MSc. Thesis) by Michal Bachman.

Abstract:

Graph databases are becoming increasingly popular as an alternative to relational databases for managing complex, densely-connected, semi-structured data. Whilst primarily optimised for online transactional processing, graph databases would greatly benefit from online analytical processing capabilities. Since relational databases were introduced over four decades ago, they have acquired online analytical processing facilities; this is not the case with graph databases, which have only drawn mainstream attention in the past few years.

In this project, we study the problem of online analytical processing in graph databases that use the property graph data model, which is a graph with properties attached to both vertices and edges. We use vertex degree analysis as a simple example problem, create a formal definition of vertex degree in a property graph, and develop a theoretical vertex degree cache with constant space and read time complexity, enabled by a cache compaction operation and a property change frequency heuristic.

We then apply the theory to Neo4j, an open-source property graph database, by developing a Relationship Count Module, which implements the theoretical vertex degree caching. We also design and implement a framework, called GraphAware, which provides supporting functionality for the module and serves as a platform for additional development, particularly of modules that store and maintain graph metadata.

Finally, we show that for certain use cases, for example those in which vertices have relatively high degrees and edges are created in separate transactions, vertex degree analysis can be performed several orders of magnitude faster, whilst sacrificing less than 20% of the write throughput, when using GraphAware Framework with the Relationship Count Module.

By demonstrating the extent of possible performance improvements, exposing the true complexity of a seemingly simple problem, and providing a starting point for future analysis and module development, we take an important step towards online analytical processing in graph databases.

The MSc. thesis: GraphAware: Towards Online Analytical Processing in Graph Databases.

Framework at Github: GraphAware Neo4j Framework.

Michal laments:

It’s not an easy, cover-to-cover read, but there might be some interesting parts, even if you don’t go through all the (over 100) pages.

It’s one hundred and forty-nine pages according to my PDF viewer.

I don’t think Michal needs to worry. If anyone thinks it is too long to read, it’s their loss.

Definitely going on my short list of things to read in detail sooner rather than later.

September 28, 2013

Building Applications with…

Filed under: Graphs,Neo4j — Patrick Durusau @ 6:53 pm

Building Applications with a Graph Database by Tobias Lindaaker.

Slides from JavaOne 2013.

Hard to tell without a video how much is lost by having only the slides.

However, the slides alone should make you curious (if not anxious) about trying to build an application based on a graph database.

Neo4j centric in places but not unexpectedly since it is hard to build an application in the abstract.

At least if you want visible results. 😉

Over one hundred (100) slides that merit a close read.

Enjoy!

September 26, 2013

Time-varying social networks in a graph database…

Filed under: AutoComplete,Graphs,Neo4j,Networks,Social Networks,Time — Patrick Durusau @ 4:02 pm

Time-varying social networks in a graph database: a Neo4j use case by Ciro Cattuto, Marco Quaggiotto, André Panisson, and Alex Averbuch.

Abstract:

Representing and efficiently querying time-varying social network data is a central challenge that needs to be addressed in order to support a variety of emerging applications that leverage high-resolution records of human activities and interactions from mobile devices and wearable sensors. In order to support the needs of specific applications, as well as general tasks related to data curation, cleaning, linking, post-processing, and data analysis, data models and data stores are needed that afford efficient and scalable querying of the data. In particular, it is important to design solutions that allow rich queries that simultaneously involve the topology of the social network, temporal information on the presence and interactions of individual nodes, and node metadata. Here we introduce a data model for time-varying social network data that can be represented as a property graph in the Neo4j graph database. We use time-varying social network data collected by using wearable sensors and study the performance of real-world queries, pointing to strengths, weaknesses and challenges of the proposed approach.

A good start on modeling networks that vary based on time.

If the overhead sounds daunting, remember the graph data used here measured the proximity of actors every 20 seconds for three days.

Imagine if you added social connections between those actors, attended the same schools/conferences, co-authored papers, etc.

We are slowly loosing our reliance on simplification of data and models to make them computationally tractable.

September 25, 2013

GraphGist Wiki

Filed under: Graphs,Neo4j — Patrick Durusau @ 6:17 pm

GraphGist Wiki

Quite naturally after I posted Why JIRA should use Neo4j I discovered it is part of a larger mother lode of graph gists!

You will find entries for the GraphGist Challenge, examples, graph design problems, tutorials, fun graph gists and philosopher graph gists.

One of the fun graphs is about Belgian beer, which lead me to: Everyone loves beer. Several Neo4j projects about beer.

Late enough in the week that I suspect Lars is thinking about what establishments to visit this weekend. 😉

Why JIRA should use Neo4j

Filed under: Graphs,Neo4j — Patrick Durusau @ 6:00 pm

Why JIRA should use Neo4j by Pieter-Jan Van Aeken.

From the post:

There are few developers in the world that have never used an issue tracker. But there are even fewer developers who have ever used an issue tracker which uses a graph database. This is a shame because issue tracking really maps much better onto a graph database, than it does onto a relational database. Proof of that is the JIRA database schema.

Now obviously, the example below does not have all of the features that a tool like JIRA provides. But it is only a proof of concept, you could map every feature of JIRA into a Neo4J database. What I’ve done below, is take out some of the core functionalities and implement those.

This caught my eye because I have been in discussions about an upgrade from an older version of JIRA to the latest and greatest.

It’s not every feature but enough to convey the flavor of a possible graph mapping.

Given the openness of a graph model, does this suggest a model for mocking up topic map models?

I first saw this in a tweet by Peter Neubauer.

Benchmarking Graph Databases

Filed under: Benchmarks,Giraph,GraphLab,MySQL,Neo4j — Patrick Durusau @ 3:33 pm

Benchmarking Graph Databases by Alekh Jindal.

Speaking of data skepticism.

From the post:

Graph data management has recently received a lot of attention, particularly with the explosion of social media and other complex, inter-dependent datasets. As a result, a number of graph data management systems have been proposed. But this brings us to the question: What happens to the good old relational database systems (RDBMSs) in the context of graph data management?

The article names some of the usual graph database suspects.

But for its comparison, it selects only one (Neo4j) and compares it against three relational databases, MySQL, Vertica and VoltDB.

What’s missing? How about expanding to include GraphLab (GraphLab – Next Generation [Johnny Come Lately VCs]) and Giraph (Scaling Apache Giraph to a trillion edges) or some of the other heavy hitters (insert your favorite) in the graph world?

Nothing against Neo4j. It is making rapid progress on a query language and isn’t hard to learn. But it lacks the raw processing power of an application like Apache Giraph. Giraph, after all, is used to process the entire Facebook data set, not a “4k nodes and 88k edges” Facebook sample as in this comparison.

Not to mention that only two algorithms were used in this comparison: PageRank and Shortest Paths.

Personally I can imagine users being interested in running more than two algorithms. But that’s just me.

Every benchmarking project has to start somewhere but this sort of comparison doesn’t really advance the discussion of competing technologies.

Not that any comparison would be complete without a discussion of typical uses cases and user observations on how each candidate did or did not meet their expectations.

September 23, 2013

XML to Cypher Converter/Geoff Converter

Filed under: Cypher,Geoff,Graphs,Neo4j — Patrick Durusau @ 6:27 pm

XML to Cypher Converter

From the webpage:

This service allows conversion of generic XML data into a Cypher CREATE statement, which can then be loaded into Neo4j.

And:

XML to Geoff Converter

From the webpage:

This service allows conversion of generic XML data into a Geoff interchange file, which can then be loaded into Neo4j.

Both services can be used as a web service, in addition to supporting the pasting in of XML in a form.

You will also want to visit Nigel Small’s Github page and his
homepage.

While poking around I also found:

XML to Graph Converter

XML data can easily be converted into a graph. Simply load paste the XML data into the left-hand side, convert into both Geoff and a Cypher CREATE statement, then view the results in the Neo4j console.

Definitely worth a deep look later this week with XML schemas.

September 18, 2013

Structr 0.8 Release

Filed under: Graphs,Neo4j,structr — Patrick Durusau @ 4:20 pm

Release 0.8 is out! by Axel Morgner.

From the post:

Yesterday, we released Structr 0.8. It was a really important milestone on the way to 1.0.

Axel answers your immediate question, “Why so Important?” with:

Because it contains a lot of improvements to the UI, and the UI is important for broad adoption. For example, we introduced “Widgets”.

In case you are unfamiliar with Structr:

Structr (pronounced ‘structure’) is a framework for mobile and web applications based on the graph database Neo4j, with a supplemental UI providing CMS functionality to serve pages, files and images.

It was designed to simplify the creation and operation of graph database applications by providing a comprehensive Java API with a built-in feature set common to most use cases, like e.g. authentication, users and groups, constraints and validation, etc..

All custom-built features are automatically exposed through a flexible RESTful API which enables developers to build sophisticated web or mobile apps based on Neo4j within just hours. [From the Structr homepage.]

The latest release is definitely worth a close look.

September 17, 2013

Joy of Holiday Assembly!

Filed under: Graphs,Neo4j — Patrick Durusau @ 4:22 pm

IKEA wardrobes and Graphs: a perfect fit! by Rik Van Bruggen.

Assembly graph

The idea for this blogpost was quite long in the making. We all know IKEA which is, like Neo4j, from Sweden. Most of us have delivered a daring attempt at assembling one of their furnitures. And most recently, even my 8- and 10-year old kids assembled their Swedish bedside tables themselves. Win!

In the past year or so, ever so often does someone approached me to talk about how to use Neo4j in a manufacturing context. And every single time I thought to myself: what a great, wonderful fit! We all know “reality is a graph”, but when you look at manufacturing processes – and the way different process components interact – you quickly see that these wonderful flowchart diagrams, actually represent a network. A graph. And when you then start thinking about all the required parts and components that are required to deliver these processes – then it becomes even more clearer: the “bill of material” of manufactured goods can also, predictably, be represented as a graph.

So there you have it. Manufacturing processes and bills or materials can be represented as a graph. And IKEA cupboards, wardrobes, tables, beds, stools – everywhere. How to make the match?

Rik does a great job of demonstrating the use of Neo4j for the familiar task of assembling furniture. And suggests that a similar graph could be used by the manufacturer of such products.

I think it is implied that creating the graph for components and the assembly process is also a way to delay the onset of assembly itself.

Rik’s mention of Sweden, however, is a tip-off this example is culturally bound. To Sweden that is.

A graph of assembly instructions in the United States, particularly during the holiday season, would be substantially different than Rik’s.

Tracking the assembly instructions, the graph would follow these rules:

  1. Number of nodes must not match number of parts
  2. Labels on nodes might match part names and/or differ by one letter.
  3. Labels on arcs would be correct no more than 80% of the time.
  4. Assembly arcs would include arcs for other models.

Perhaps a new holiday tradition?

Creating an assembly graph for a randomly chosen set of instructions?

😉

September 13, 2013

An empirical comparison of graph databases

Filed under: Benchmarks,DEX,Graphs,Neo4j,OrientDB,Titan — Patrick Durusau @ 2:39 pm

An empirical comparison of graph databases by Salim Jouili and Valentin Vansteenberghe.

Abstract:

In recent years, more and more companies provide services that can not be anymore achieved efficiently using relational databases. As such, these companies are forced to use alternative database models such as XML databases, object-oriented databases, document-oriented databases and, more recently graph databases. Graph databases only exist for a few years. Although there have been some comparison attempts, they are mostly focused on certain aspects only.

In this paper, we present a distributed graph database comparison framework and the results we obtained by comparing four important players in the graph databases market: Neo4j, OrientDB, Titan and DEX.

(Salim Jouili and Valentin Vansteenberghe, An empirical comparison of graph databases. To appear in Proceedings of the 2013 ASE/IEEE International Conference on Big Data, Washington D.C., USA, September 2013.)

For your convenience:

DEX

Neo4j

OrientDB

Titan

I won’t reproduce the comparison graphs here. The “winner” depends on your requirements.

Looking forward to seeing this graph benchmark develop!

September 11, 2013

Neo4j 2.0.0-M05 released

Filed under: Graphs,Neo4j — Patrick Durusau @ 5:16 pm

Neo4j 2.0.0-M05 released by Peter Neubauer.

From the post:

We are proud to release Neo4j 2.0.0-M05 as a milestone today. The 2.0 project is now in full speed development after summer vacation. We’re getting close to feature completeness now, and we want to get this release out to you so you can give us refined feedback for the final release.

Peter covers the following highlights:

  • Unique Constraints
  • Label store
  • AutoClosable transactions
  • Minimalistic Cypher and JSON
  • Deprecated > /dev/null

I’m not real sure what Peter means by “…summer vacation…,” must be one of those old European traditions. 😉

However, whatever that may mean, Neo4j 2.0.0-M05 does look like a must have release!

September 10, 2013

Graphity Server for social activity streams released (GPLv3)

Filed under: Graphs,Neo4j — Patrick Durusau @ 10:51 am

Graphity Server for social activity streams released (GPLv3) by René Pickhardt.

From the post:

It is almost 2 years over since I published my first ideas and works on graphity which is nowadays a collection of algorithms to support efficient storage and retrieval of more than 10k social activity streams per second. You know the typical application of twitter, facebook and co. Retrieve the most current status updates from your circle of friends.

Today I proudly present the first version of the Graphity News Stream Server. Big thanks to Sebastian Schlicht who worked for me implementing most of the Servlet and did an amazing job! The Graphity Server is a neo4j powered servlet with the following properties:

  • Response times for requests are usually less than 10 milliseconds (+network i/o e.g. TCP round trips coming from HTTP)
  • The Graphity News Stream Server is a free open source software (GPLv3) and hosted in the metalcon git repository. (Please also use the bug tracker there to submit bugs and feature requests)
  • It is running two Graphity algorithms: One is read optimized and the other one is write optimized, if you expect your application to have more write than read requests.
  • The server comes with an REST API which makes it easy to hang in the server in whatever application you have.
  • The server’s response also follows the activitystrea.ms format so out of the box there are a large amount of clients available to render the response of the server.
  • The server ships together with unit tests and extensive documentation especially of the news stream server protocol (NSSP) which specifies how to talk to the server. The server can currently handle about 100 write requests in medium size (about a million nodes) networks. I do not recommend to use this server if you expect your user base to grow beyond 10 Mio. users (though we are working to get the server scaling) This is mostly due to the fact that our data base right now won’t really scale beyond one machine and some internal stuff has to be handled synchronized.

Koding.com is currently thinking to implement Graphity like algorithms to power their activity streams. It was for Richard from their team who pointed out in a very fruitfull discussion how to avoid the neo4j limit of 2^15 = 32768 relationship types by using an overlay network. So his ideas of an overlay network have been implemented in the read optimized graphity algorithm. Big thanks to him!

Now I am relly excited to see what kind of applications you will build when using Graphity.

If you’ll use graphity

Please tell me if you start using Graphity, that would be awesome to know and I will most certainly include you to a list of testimonials.

By they way if you want to help spreading the server (which is also good for you since more developer using it means higher chance to get newer versions) you can vote up my answer in stack overflow:

http://stackoverflow.com/questions/202198/whats-the-best-manner-of-implementing-a-social-activity-stream/13171306#13171306

This is very cool!

Take Graphity for a spin and let René know what you think.

Perhaps we can all hide in digital chaff? 😉

September 9, 2013

STEFFI…

Filed under: Graphs,Neo4j,STEFFI,Titan — Patrick Durusau @ 6:03 pm

STEFFI – Scalable Traversal Engine For Fast In-memory graphDB

From the webpage:

STEFFI is a distributed graph database fully in-memory and amazingly fast when it comes to querying large datasets.

As a scalable graph database, STEFFI’s performance can directly be compared to Neo4j and Titan. It provides its users with a clear competitive advantage when it comes to complicated traversal operations on large datasets. Speedups of up to 200 have been observed when comparing STEFFI whith its alternatives.

More than an alternative to existing solutions, STEFFI opens up new possibilities for high-performance graph storage and manipulation.

Main features

  • in-memory storage for a fast random access
  • distributed parallel computing for high-speed graph queries
  • graph traversal engine for graph processing
  • scalability for a growing data
  • implementing the Blueprints API from tinkerpop for an enchanced accessibility

Recommended for

  • fast recommendation engines (e-commerce, telecommunications, finance, …)
  • large biological networks analysis (biopharma, healthcare, … )
  • security networks management & real-time fraud detection (bank, public institutions, …)
  • complex network & data center management (telecommunications, e-commerce, …)
  • and much more!

Availability

STEFFI is currently in its incubation phase within EURA NOVA. Once the code is mature and stable enough, STEFFI will be provided via this website under the Apache Licence Version 2. If you would like to know more about this project evolution, do not hesitate to subscribe to our mailing list or contact EURA NOVA.

I haven’t run the performance tests personally against Neo4j and Titan but the reported performance gains (200X and 150X, respectively) are impressive.

BTW, you probably want the paper that lead to STEFFI, imGraph: A distributed in-memory graph database by Salim Jouili and Aldemar Reynaga.

September 8, 2013

Scaling Writes

Filed under: Graphs,Neo4j,Scalability — Patrick Durusau @ 5:58 pm

Scaling Writes by Max De Marzi.

From the post:

Most of the applications using Neo4j are read heavy and scale by getting more powerful servers or adding additional instances to the HA cluster. Writes however can be a little bit tricker. Before embarking on any of the following strategies it is best that the server is tuned.

Max encountered someone who wants to write data. How weird is that? 😉

Seriously, your numbers may vary from Max’s but you will be on your way to tuning write performance after reading this post.

Don’t depend on the NSA to capture your data. Freedom of Information requests take time and often have omissions. Test your recovery options with any writing solution.

September 2, 2013

Neo4j 1.9.3 now available

Filed under: Graphs,Neo4j — Patrick Durusau @ 7:11 pm

Neo4j 1.9.3 now available by Ian Robinson.

From the post:

Today we’re pleased to release Neo4j 1.9.3. This is a bugfix release in the 1.9.x series and has no new features (though it does restore an old way of registering unmanaged extensions with the server).

If you’re on an earlier 1.9 release then you’re strongly encouraged to upgrade (which can be performed without downtime in an HA cluster). You can download it from the neo4j.org web site.

The Fall season of point releases approaches! 😉

August 29, 2013

Neo4j Cypher Refcard 2.0

Filed under: Cypher,Graphs,Neo4j — Patrick Durusau @ 6:12 pm

Neo4j Cypher Refcard 2.0

This looks very useful.

If nobody else does, I will cast this into a traditional refcard format.

August 26, 2013

NoSQL: Data Grids and Graph Databases

Filed under: Graph Databases,Neo4j — Patrick Durusau @ 2:27 pm

NoSQL: Data Grids and Graph Databases by Al Rubinger.

Chapter Six of Continuous Enterprise Development in Java by Andrew Lee Rubinger and Aslak Knutsen. Accompanying website.

From chapter six:

Until relatively recently, the RDBMS reigned over data in enterprise applications by a wide margin when contrasted with other approaches. Commercial offerings from Oracle and established open-source projects like MySQL (reborn MariaDB) and PostgreSQL became defacto choices when it came to storing, querying, archiving, accessing, and obtaining data. In retrospect, it’s shocking that given the varying requirements from those operations, one solution was so heavily lauded for so long.

In the late 2000s, a trend away from the strict ACID transactional properties could be clearly observed given the emergence of data stores that organized information differently from the traditional table model:

  • Document-oriented
  • Object-oriented
  • Key/Value stores
  • Graph models

In addition, many programmers were beginning to advocate for a release from strict transactions; in many use cases it appeared that this level of isolation wasn’t enough of a priority to warrant the computational expense necessary to provide ACID guarantees.

No, what’s shocking is the degree of historical ignorance among people who criticize RDBMS systems. Either than or they are simply parroting what other ignorant people are saying about RDBMS systems.

Don’t get me wrong, I strongly prefer NoSQL solutions in some cases. But it is a question of requirements and not making up tales about RDBMS systems.

For example, in A transient hypergraph-based model for data access Carolyn Watters and Michael A. Shepherd write:

Two major methods of accessing data in current database systems are querying and browsing. The more traditional query method returns an answer set that may consist of data values (DBMS), items containing the answer (full text), or items referring the user to items containing the answer (bibliographic). Browsing within a database, as best exemplified by hypertext systems, consists of viewing a database item and linking to related items on the basis of some attribute or attribute value. A model of data access has been developed that supports both query and browse access methods. The model is based on hypergraph representation of data instances. The hyperedges and nodes are manipulated through a set of operators to compose new nodes and to instantiate new links dynamically, resulting in transient hypergraphs. These transient hypergraphs are virtual structures created in response to user queries, and lasting only as long as the query session. The model provides a framework for general data access that accommodates user-directed browsing and querying, as well as traditional models of information and data retrieval, such as the Boolean, vector space, and probabilistic models. Finally, the relational database model is shown to provide a reasonable platform for the implementation of this transient hypergraph-based model of data access. (Emphasis added.)

Oh, did I say that paper was written in 1990, some twenty-three years ago?

So twenty-three (23) years ago that bad old RDBMS model was capable of implementing a hypergraph.

A hypergraph that had, wait for it, true hyperedges, not the faux hyperedges claimed by some graph databases.

It’s that lack of accuracy that makes me wonder about what else has been missed?

August 23, 2013

Cypher shell with logging

Filed under: Cypher,Documentation,Neo4j — Patrick Durusau @ 6:12 pm

Cypher shell with logging by Alex Frieden.

From the post:

For those who don’t know, Neo4j is a graph database built with Java. The internet is abound with examples, so I won’t bore you with any.

Our problem was a data access problem. We built a loader, loaded our data into neo4j, and then queried it. However we ran into a little problem: Neo4j at the time of release logs in the home directory (at least on linux redhat) what query was ran (its there as a hidden file). However, it doesn’t log what time it was run at. One other problem as an administrator point of view is not having a complete log of all queries and data access. So we built a cypher shell that would do the logging the way we needed to log. Future iterations of this shell will have REST cypher queries and not use the embedded mode (which is faster but requires a local connection to the data). We also wanted a way in the future to output results to a file.
(…)

Excellent!

Logs are a form of documentation. You may remember that documentation was #1 in the Solr Usability contest.

Documentation is important! Don’t neglect it.

August 16, 2013

imGraph: A distributed in-memory graph database

Filed under: Graphs,Neo4j,Titan — Patrick Durusau @ 3:51 pm

imGraph: A distributed in-memory graph database by Salim Jouili.

From the post:

Eura Nova contribution

Having these challenges in mind, we introduce a new graph database system called imGraph. We have considered the random access requirement for large graphs as a key factor on deciding the type of storage. Then, we have designed a graph database where all data is stored in memory so the speed of random access is maximized. However, as large graphs can not be completely loaded in the RAM of a single machine, we designed imGraph as distributed graph database. That is, the vertices and the edges are partitioned into subsets, and each subset is located in the memory of one machine belonging to the involved machines (see the following figure). Furthermore, we implemented on imGraph a graph traversal engine that takes advantage of distributed parallel computing and fast in-memory random access to gain performance.

I haven’t verified the numbers but imGraph is reported to have beaten both Titan and Neo4j by x150 and x200, respectively on particular data sets.

Enough to justify reading the paper.

The test machines each had 7.5 GB of memory, which seems a little lite to me.

Particularly since the IBM Power 770 server can expand to hold 4 TB of memory.

Imagine the performance on five (5) machines where each has 4 TB of memory.

True, it would be more expensive but at some point, there is only so much performance you can squeeze out of a commodity box.

BTW, the paper: imGraph: A distributed in-memory graph database.

August 11, 2013

Exploring LinkedIn in Neo4j

Filed under: Graphs,Maps,Neo4j — Patrick Durusau @ 6:33 pm

Exploring LinkedIn in Neo4j by Rik Van Bruggen.

From the post:

Ever since I have been working for Neo, we have been trying to give our audience as many powerful examples of places where graph databases could really shine. And one of the obvious places has always been: social networks. That’s why I’ve written a post about facebook, and why many other graphistas have been looking at facebook and others to explain what could be done.

But while Facebook is probably the best-known social network, the one I use professionally the most is: LinkedIn. Some call it the creepiest network, but the fact of the matter is that professional network is, and has always been, a very useful way to get and stay in contact with other people from other organisations. And guess what: they do some fantastic stuff with their own, custom-developed graphs. One of these things is InMaps – a fantastic visualisation and colour coded analysis of your professional network. That’s where this blogpost got its inspiration from.

As Rik points out, you can view InMaps but you can do much else.

To fix that, Rik guides you through extracting data from InMaps and loading it into Neo4j.

For extra credit, try merging your data with data on the same people from other sources.

Could give you some insight into the problems faced by the NSA.

August 9, 2013

Neo4j 2.0 Milestone 4

Filed under: Graphs,Neo4j — Patrick Durusau @ 3:58 pm

Summer Release – Neo4j 2.0 Milestone 4 by Chris Leishman.

From the post:

Perfectly suited for your summer holiday exploration, we are proud to present the Neo4j 2.0 Milestone 4 release (2.0.0-M04).

In working towards a major 2.0 release with some outstanding new functionality, this milestone contains many beneficial and necessary changes, some of which will require changes in the way Neo4j is used (see: deprecations).

As Chris says, it is something to explore over the summer holidays!

Download.

BTW, update to Java 7 before you try this milestone. It’s required.

July 19, 2013

Neo4j 1.9.2 now available!

Filed under: Graphs,Neo4j — Patrick Durusau @ 3:10 pm

Neo4j 1.9.2 now available! by Jim Webber.

Jim announces that Neo4j 1.9.2 is available for download.

From the post:

Neo4j 1.9.2 is available immediately and is an easy upgrade from any other 1.9.x versions as there are no store upgrades required and so everyone on Neo4j 1.9 and Neo4j 1.9.1 is strongly encouraged to upgrade to 1.9.2.

You need to look at the release notes for 1.9.2:

The 1.9.2 release of Neo4j is a maintenance release that corrects a serious issue concurrency issue introduced in Neo4j 1.9.1, and resolves some other critical issues when reading from the underlying store. All Neo4j users are highly encouraged to upgrade to this version.

Any release that fixes “critical” errors is a must upgrade.

July 16, 2013

Predicting Terrorism with Graphs

Filed under: Graphs,Neo4j — Patrick Durusau @ 3:07 pm

A Little Graph Theory for the Busy Developer by Jim Webber.

From the description:

In this talk we’ll explore powerful analytic techniques for graph data. Firstly we’ll discover some of the innate properties of (social) graphs from fields like anthropology and sociology. By understanding the forces and tensions within the graph structure and applying some graph theory, we’ll be able to predict how the graph will evolve over time. To test just how powerful and accurate graph theory is, we’ll also be able to (retrospectively) predict World War 1 based on a social graph and a few simple mechanical rules.

A presentation for NoSQL Now!, August 20-22, 2013, San Jose, California.

I would appreciate your asking Jim to predict the next major act of terrorism using Neo4j.

If he can predict WWI with “a few mechanical rules,” the “power and accuracy of graphs” should support prediction of terrorism.

Yes?

If you read the 9/11 Commission Report (pdf), you too can predict 9/11, in retrospect.

Without any database at all.

Don’t get me wrong, I really like graph databases. And they have a number of useful features.

Why not sell graph databases based on technical merit?

As opposed to carny sideshow claims?

July 15, 2013

Combining Neo4J and Hadoop (part II)

Filed under: Graphs,Hadoop,Neo4j — Patrick Durusau @ 12:20 pm

Combining Neo4J and Hadoop (part II) by Kris Geusebroek.

From the post:

In the previous post Combining Neo4J and Hadoop (part I) we described the way we combine Hadoop and Neo4J and how we are getting the data into Neo4J.

In this second part we will take you through the journey we took to implement a distributed way to create a Neo4J database. The idea is to use our Hadoop cluster for creating the underlying file structure of a Neo4J database.

To do this we must first understand this file-structure. Luckily Chris Gioran has done a great job describing this structure in his blog Neo4J internal file storage.

The description was done for version 1.6 but largely still matches the 1.8 file-structure.

First I’ll start with a small recap of the file-structure.

The Chris Gioran post has been updated at: Rooting out redundancy – The new Neo4j Property Store.

Internal structures influence what you can or can’t easily say. Best to know about those structures in advance.

July 13, 2013

Working examples for the ‘Graph Databases’ book

Filed under: Graphs,Neo4j — Patrick Durusau @ 4:21 pm

Working examples for the ‘Graph Databases’ book by Joerg Baach.

From the post:

The examples in the ‘Graph Databases’ book don’t work out of the box. I’ve modified them, so that they do work (for chapter 3, that is).

In the version I have, 2013-02-25, the examples in question occur in chapter 4.

But whichever chapter, the corrections are welcome news.

BTW, there are other chapters that probably need the same treatment.

Fun with Music, Neo4j and Talend

Filed under: Graphs,Neo4j,Talend — Patrick Durusau @ 3:52 pm

Fun with Music, Neo4j and Talend by Rik Van Bruggen.

From the post:

Many of you know that I am a big fan of Belgian beers. But of course I have a number of other hobbies and passions. One of those being: Music. I have played music, created music (although that seems like a very long time ago) and still listen to new music almost every single day. So when sometime in 2006 I heard about this really cool music site called Last.fm, I was one of the early adopters to try use it. So: a good 7 years later and 50k+ scrobbles later, I have quite a bit of data about my musical habits.

On top of that, I have a couple of friends that have been using Last.fm as well. So this got me thinking. What if I was somehow able to get that last.fm data into neo4j, and start “walking the graph”? I am sure that must give me some interesting new musical insights… It almost feels like a “recommendation graph for music” … Let’s see where this brings us.

Usual graph story but made more interesting by the use of Talend ETL tools.

Good opportunity to become familiar with Talend if you don’t know the tools already.

July 9, 2013

The Last Mile

Filed under: Graphs,Neo4j — Patrick Durusau @ 6:50 pm

The Last Mile by Max De Marzi.

From the post:

The “last mile” is a term used in the telecommunications industry that refers to delivering connectivity to the customers that will actually be using the system. In the sense of Graph Databases, it refers to how well the end user can extract value and insight from the graph. We’ve already seen an example of this concept with Graph Search, allowing a user to express their requests in natural language. Today we’ll see another example. We’ll be taking advantage of the features of Neo4j 2.0 to make this work, so be sure to have read the previous post on the matter.

We’re going to be using VisualSearch.js made by Samuel Clay of NewsBlur. VisualSearch.js enhances ordinary search boxes with the ability to autocomplete faceted search queries. It is quite easy to customize and there is an annotated walkthrough of the options available. You can see what it does in the image below, or click it to try their demo.

Graphs are ok.

Storing data in graphs is better.

Useful retrieval of data from graphs is the best. 😉

Max does his usual excellent job of illustrating useful retrieval of information from a Neo4j graph.

His use of labels does remind me of a post I need to finishj.

« Newer PostsOlder Posts »

Powered by WordPress