Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 7, 2013

BPM Engine With Neo4j

Filed under: Graphs,Neo4j — Patrick Durusau @ 2:58 pm

NoSQL experimentations with Activiti: A (very simple) Neo4J prototype

From the post:

I’ve got this itch for a long time now to try and see how easy it is to write a BPM engine on a graph database such as Neo4J. After all, the data model fits perfectly, as business processes are graphs and executions of those processes basically boil down to keeping pointers to where you are in that graph. It just feels as a very natural fit.

So I spend some time implementing a prototype, which you can find on https://github.com/jbarrez/activiti-neo4j

The prototype contains some unit tests which execute simple BPMN 2.0 processes. I tried to be as close as possible to the Activiti concepts of services, commands, executions, JavaDelegates, etc. Currently covered:

If BPMN is unfamiliar:

A standard Business Process Model and Notation (BPMN) will provide businesses with the capability of understanding their internal business procedures in a graphical notation and will give organizations the ability to communicate these procedures in a standard manner. Furthermore, the graphical notation will facilitate the understanding of the performance collaborations and business transactions between the organizations. This will ensure that businesses will understand themselves and participants in their business and will enable organizations to adjust to new internal and B2B business circumstances quickly.

BPMN homepage, includes links to version 2.0 and other materials.

Processes, business and otherwise, seem like naturals to model as graphs.

Mini Search Engine…

Filed under: Graphs,Neo4j,Search Engines,Searching — Patrick Durusau @ 1:13 pm

Mini Search Engine – Just the basics, using Neo4j, Crawler4j, Graphstream and Encog by Brian Du Preez.

From the post:

Continuing to chapter 4 of Programming Collection Intelligence (PCI) which is implementing a search engine.

I may have bitten off a little more than I should of in 1 exercise. Instead of using the normal relational database construct as used in the book, I figured, I always wanted to have a look at Neo4J so now was the time. Just to say, this isn’t necessarily the ideal use case for a graph db, but how hard could to be to kill 3 birds with 1 stone.

Working through the tutorials trying to reset my SQL Server, Oracle mindset took a little longer than expected, but thankfully there are some great resources around Neo4j.

Just a couple:
neo4j – learn
Graph theory for busy developers
Graphdatabases

Since I just wanted to run this as a little exercise, I decided to go for a in memory implementation and not run it as a service on my machine. In hindsight this was probably a mistake and the tools and web interface would have helped me visualise my data graph quicker in the beginning.

The general search space is filled by major contenders.

But that leaves open opportunities for domain specific search services.

Law and medicine have specialized search engines. What commercially viable areas are missing them?

July 5, 2013

On Importing Data into Neo4j

Filed under: Graphs,Neo4j — Patrick Durusau @ 12:26 pm

On Importing Data into Neo4j (Blog Series) by Michael Hunger.

From the post:

Being able to run interesting queries against a graph database like Neo4j requires the data to be in there in the first place. As many users have questions in this area, I thought a series on importing data into Neo4j would be helpful to get started. This series covers both importing small and moderate data volumes for examples and demonstrations, but also large scale data ingestion.

For operations where massive amounts of data flow in or out of a Neo4j database, the interaction with the available APIs should be more considerate than with your usual, ad-hoc, local graph queries.

This blog series will discuss several ways of importing data into Neo4j and the considerations you should make when choosing one or the other. There is a dedicated page about importing data on neo4j.org which will help you getting started but needs feedback to improve.

Basically Neo4j offers several APIs to import data. Preferably the Cypher query language should be used as it is easiest to use from any programming language. The Neo4j Server’s REST API is not suited for massive data import, only the batch-operation endpoint and the Cypher REST and transactional HTTP (from Neo4j 2.0) endpoints are of interest there. Neo4j’s Java Core APIs provide a way of avoiding network overhead and driving data import directly from a programmatic source of data and also allow to drop down to the lowest API levels for high speed ingestion.

Great overview of importing data into Neo4j with the promise of more post to follow on importing data into Neo4j 2.0.

I first saw this at Alex Popescu’s On Importing Data into Neo4j.

July 2, 2013

Finding relationships in Trademark Data

Filed under: Graphs,Neo4j — Patrick Durusau @ 7:18 pm

Finding relationships in Trademark Data by Matt Overstreet.

From the post:

At the recent National Day of Civic hacking here at OSC we dug into a few ways to find relationships between Trademarks files with the USPTO.

If you’ve ever played with the US trademark data you’ll know that it’s both plentiful and scarce. There are lot’s of trademark fillings, each with the minimum possible data to make them uniquely identifiable.

That’s great for streamlined government and citizen anonymity, but no fun for finding the relationships between filings. We needed to suss out more information about the graph of trademarks. That’s when we Eric and Wes tripped over the translations included in many of the patent filings. We wondered if the term space for these translations might be smaller and more consistent then the space defined by the actual trademarks. Translations were less likely to play games with spelling or grammar the way one might with the actual mark.

Some Hacking with the data and Neo4j resulted in an intriguing dataset that we are still unpacking. Want to play with the data? Neo4J loaded with data is at this url: http://rosetta.bloom.sh:7474/webadmin/

Curious to know what you make of the theory:

Translations were less likely to play games with spelling or grammar the way one might with the actual mark.

I’m not sure that is a useful assumption:

Marks consisting of or including foreign words or terms from common, modern languages are translated into English to determine genericness, descriptiveness, likelihood of confusion, and other similar issues. See Palm Bay ,396 F.3d at 1377, 73 USPQ2d at 1696. With respect to likelihood of confusion, “[i]t is well established that foreign words or terms are not entitled to be registered if the English language equivalent has been previously used on or registered for products which might reasonably be assumed to come from the same source.” Mary Kay Cosmetics, Inc. v. Dorian Fragrances, Ltd. , 180 USPQ 406, 407 (TTAB 1973).[Examination Guide 1-08]

Use of the English translation to judge “…genericness, descriptiveness, likelihood of confusion, and other similar issues.”, lends an incentive for “…play[ing] games with spelling or grammar….”

If you are interested in the data set, you may find the resources at: Trademarks Home useful.

Caution: Legal terminology may not have the semantics you expect.

June 28, 2013

Neo4j 1.9.1 Released!

Filed under: Graphs,Neo4j — Patrick Durusau @ 4:08 pm

Neo4j 1.9.1 Released! by Jim Webber.

From the post:

It’s been a while since I was last let loose on the Neo4j blog, and I’ve marked my return with some good news. This week marks the release of Neo4j 1.9.1, numerically at least just a maintenance release in the 1.9 series.

However under the covers the engineering team has been working away developing more safety features for high-availability clustered deployments, squashing a couple of bugs, and improving SSL support for chained certificates and adding streaming support for paged traversals in the REST API.

If you’re a 1.9 user, you’re strongly recommended to upgrade to 1.9.1 and new users should proceed directly to 1.9.1. You won’t need any store upgrades going from 1.9 to 1.9.1 so it’s an easy upgrade.

Download at the usual place and happy graphing!

Unless you are otherwise occupied, this weekend sounds like the time to upgrade!

Are You Tracking Emails?

Filed under: Cypher,Graphs,Neo4j — Patrick Durusau @ 4:01 pm

neo4j/cypher: Aggregating relationships within a path by Mark Needham.

From the post:

I recently came across an interesting use case of paths in a graph where we wanted to calculate the frequency of communication between two people by showing how frequently each emailed the other.

The model looked like this:

email graph

I can’t imagine why Mark would think about tracking emails between people. 😉

And as Mark says, the query he settles on isn’t guaranteed to scale.

Still, it is an interesting exercise.

June 23, 2013

Fun with Facebook in Neo4j [Separation from Edward Snowden?]

Filed under: Facebook,Graphs,Neo4j — Patrick Durusau @ 1:13 pm

Fun with Facebook in Neo4j by Rik Van Bruggen.

From the post:

Ever since Facebook promoted its “graph search” methodology, lots of people in our industry have been waking up to the fact that graphs are über-cool. Thanks to the powerful query possibilities, people like Facebook, Twitter, LinkedIn, and let us not forget, Google have been providing us with some of the most amazing technologies. Specifically, the power of the “social network” is tempting many people to get their feet wet, and to start using graph technology. And they should: graphs are fantastic at storing, querying and exploiting social structures, stored in a graph database.

So how would that really work? I am a curious, “want to know” but “not very technical” kind of guy, and I decided to get my hands dirty (again), and try some of this out by storing my own little part of Facebook – in neo4j. Without programming any kind of production-ready system – because I don’t know how – but with enough real world data to make us see what it would be like.

Rik walks you through obtaining data from Facebook, munging it in a spreadsheet and loading it into Neo4j.

Can’t wait for Facebook graph to support degrees of separation from named individuals, like Edward Snowden.

Complete with the intervening people of course.

What’s privacy compared to a media-driven witch hunt for anyone “connected” to the latest “face” on the TV?

If Facebook does that for Snowden, they should do it for NSA chief, Keith Alexander as well.

June 17, 2013

I Mapreduced a Neo store [Legacy of CPU Shortages?]

Filed under: Graphs,Hadoop,MapReduce,Neo4j — Patrick Durusau @ 8:40 am

I Mapreduced a Neo store by Kris Geusebroek.

From the post:

Lately I’ve been busy talking at conferences to tell people about our way to create large Neo4j databases. Large means some tens of millions of nodes and hundreds of millions of relationships and billions of properties.

Although the technical description is already on the Xebia blog part 1 and part 2, I would like to give a more functional view on what we did and why we started doing it in the first place.

Our use case consisted of exploring our data to find interesting patterns. The data we want to explore is about financial transactions between people, so the Neo4j graph model is a good fit for us. Because we don’t know upfront what we are looking for we need to create a Neo4j database with some parts of the data and explore that. When there is nothing interesting to find we go enhance our data to contain new information and possibly new connections and create a new Neo4j database with the extra information.

This means it’s not about a one time load of the current data and keep that up to date by adding some more nodes and edges. It’s really about building a new database from the ground up everytime we think of some new way to look at the data.

Deeply interesting work, particularly for its investigation of the internal file structure of Neo4j.

Curious about the

…building a new database from the ground up everytime we think of some new way to look at the data.

To what extent are static database structures a legacy of a shortage of CPU cycles?

With limited CPU cycles, it was necessary to create a static structure, against which query languages could be developed and optimized (again because of a shortage of CPU cycles), and the persisted data structure avoided the overhead of rebuilding the data structure for each user.

It may be that cellphones and tablets need the convenience of static data structures or at least representations of static data structures.

But what of server farms populated by TBs of 3D memory?

Isn’t it time to start thinking beyond the limitations imposed by decades of CPU cycle shortages?

Semantic Diversity – Special Characters

Filed under: Lucene,Neo4j,Programming,Semantic Diversity — Patrick Durusau @ 8:16 am

neo4j/cypher/Lucene: Dealing with special characters by Mark Needham.

Mark outlines how to handle “special characters” in Lucene (indexer for Neo4j), only to find that an escape character for a Lucene query is also a special character for Cypher, which itself must be escaped.

There is a chart in Mastering Regular Expressions by Jeffrey E F Friedl of “special” characters but that doesn’t cover all the internal parsing choices software.

Over the last sixty plus years there has been little progress towards a common set of “special” characters in computer science.

Handling of “special” characters lies at the heart of accessing data and all programs have code to account for them.

With no common agreement on “special” characters, what reason would you offer to expect convergence elsewhere?

June 16, 2013

Storing and visualizing LinkedIn with Neo4j and sigma.js

Filed under: Graphs,Neo4j,Sigma.js,Visualization — Patrick Durusau @ 3:19 pm

Storing and visualizing LinkedIn with Neo4j and sigma.js by Bob Briody.

From the post:

In this post I am going to present a way to:

  • load a linkedin networkvia the linkedIn developer AP into neo4j using python
  • serve the network from neo4j using node.js, express.js, and cypher
  • display the network in the browser using sigma.js

Bob remarks that his method for deduping relationships would not scale to very large networks.

Pointers to how LinkedIn deals with that problem?

I first saw this in a tweet by Peter Neubauer.

June 3, 2013

Putting All Species In A Graph Database [Thursday, June 6th, 13:00 EDT]

Filed under: Graphs,Neo4j — Patrick Durusau @ 3:26 pm

Putting All Species In A Graph Database

From the post:

Stephen Smith, an ecology and evolutionary biology professor at the University of Michigan, is going to explain how Neo4j and other digital technologies are assisting in constructing the tree of life. Starting at 10:00 PDT (19:00 CEST), he will also discuss other aspects of the interface of biology with next generation technologies.

“Our project is building the tools with which scientists in the community can continually improve the tree of life as we gather new information. Neo4j allows us to not only store trees in their native graph form, but also allows us to map trees to the same structure, the graph. So in fact, we are facilitating the construction of the graph of life,” says Smith.

Neo4j approached the Open Tree of Life team to present a webinar because it is a project that utilizes the Neo4j graph database to represent the interconnectedness of biological data. The company considers the project a great example of how a graph database can better model the natural world.

The online lecture is intended for a broad audience including beginner computer programmers, advanced hackers, data scientists, natural scientists, and anyone interested in the cross-section of science and technology, especially data modeling. Over 150 people have already registered online.

The registration form: [Registration]

This should be fun.

Get your questions ready!

My questions:

What happens when our understanding of the tree of life changes?

How do we preserve the “old” and “new” understandings of the tree of life?

Can we compare those understandings?

May 31, 2013

New Milestone Release Neo4j 2.0.0-M03

Filed under: Graphs,Neo4j — Patrick Durusau @ 4:17 pm

New Milestone Release Neo4j 2.0.0-M03 by Michael Hunger.

From the post:

The latest M03 milestone release of Neo4j 2.0 is as you expected all about improvements to Cypher. This blog post also discusses some changes made in the last milestone (M02) which we didn’t fully cover.

MERGE

Cypher now contains a MERGE clause which is pretty big: It will be replacing CREATE UNIQUE as it also takes indexes and labels into accounts and can even be used for single node creation. MERGE either matches the graph and returns what is there (one or more results) or if it doesn’t find anything it creates the path given. So after the MERGE operation completes, Neo4j guarantees that the declared pattern is there.

We also added additional clauses to the MERGE statement which allow you to create or update properties as a function of whether the node was matched or created. Please note that — as patterns can contain multiple named nodes and relationships — you will have to specify the element for which you want to trigger an update operation upon creation or match.

MERGE (keanu:Person { name:'Keanu Reeves' })
ON CREATE keanu SET keanu.created = timestamp()
ON MATCH  keanu SET keanu.lastSeen = timestamp()
RETURN keanu

We put MERGE out to mainly collect feedback on the syntax and usage, there are still some caveats, like not grabbing locks for unique creation so you might end up with duplicate nodes for now. That will all be fixed by the final release.

Going along with MERGE, MATCH now also supports single node patterns, both with and without labels.

(…)

MERGE is definitely something to investigate in this milestone release.

You need to also take a close look at labels.

What issues, if any, do you see with the label mechanism?

I see several but will cover them early next week. Work up your list (if any) to see if we reach similar conclusions.

Visualizing the News with VivaGraphJS

Filed under: AlchemyAPI,DBpedia,Graphs,Neo4j,Visualization — Patrick Durusau @ 2:17 pm

Visualizing the News with Vivagraph.js by Max De Marzi.

From the post:

Today I want to introduce you to VivaGraphJS – a JavaScript Graph Drawing Library made by Andrei Kashcha of Yasiv. It supports rendering graphs using WebGL, SVG or CSS formats and currently supports a force directed layout. The Library provides an API which tracks graph changes and reflect changes on the rendering surface which makes it fantastic for graph exploration.

The post includes AlchemyAPI (entity extraction), DBpedia (additional information), Feedzilla (news feeds), and Neo4j (graphs).

The technology rocks but the content, well, your mileage will vary.

May 24, 2013

Graph Databases and Software Metrics & Analysis

Filed under: Graphs,Neo4j,Programming — Patrick Durusau @ 6:40 pm

Graph Databases and Software Metrics & Analysis by Michael Hunger

From the post:

This is the first in a series of blog posts that discuss the usage of a graph database like Neo4j to store, compute and visualize a variety of software metrics and other types of software analytics (method call hierarchies, transitive clojure, critical path analysis, volatility & code quality). Follow up posts by different contributors will be linked from this one.

Everyone who works in software development comes across software metrics at some point.

Just because of curiosity about the quality or complexity of the code we’ve written, or a real interest to improve quality and reduce technical debt, there are many reasons.

In general there are many ways of approaching this topic, from just gathering and rendering statistics in diagrams to visualizing the structure of programs and systems.

There are a number of commercial and free tools available that compute software metrics and help expose the current trend in your projects development.

Software metrics can cover different areas. Computing cyclomatic complexity, analysing dependencies or call traces is probably easy, using statical analysis to find smaller or larger issues is more involved and detecting code smells can be an interesting challenge in AST parsing.

Interesting work on using graph databases (here Neo4j) for software analysis.

Be sure to see the resources listed at the end of the post.

May 21, 2013

Neo4j 1.9 General Availability Announcement!

Filed under: Graphs,Neo4j — Patrick Durusau @ 4:38 pm

Neo4j 1.9 General Availability Announcement! by Philip Rathle.

From the post:

After over a year of R&D, five milestone releases, and two release candidates, we are happy to release Neo4j 1.9 today! It is available for download effective immediately. And the latest source code is available, as always, on Github.

The 1.9 release adds primarily three things:

  • Auto-Clustering, which makes Neo4j Enterprise clustering more robust & easier to administer, with fewer moving parts
  • Cypher language improvements make the language more functionally powerful and more performant, and
  • New welcome pages make learning easier for new users

May 20, 2013

Reloading my Beergraph – using an in-graph-alcohol-percentage-index

Filed under: Graphs,Neo4j — Patrick Durusau @ 3:27 pm

Reloading my Beergraph – using an in-graph-alcohol-percentage-index by Rik Van Bruggen.

From the post:

As you may remember, I created a little beer graph some time ago to experiment and have fun with beer, and graphs. And yes, I have been having LOTS of fun with it – using it to explain graph concepts to lots of not-so-technical folks, like myself. Many people liked it, and even more people had some questions about it – started thinking in graphs, basically. Which is way more than what I ever hoped for – so that’s great!

One of the questions that people always asked me was about the model. Why did I model things the way I did? Are there no other ways to model this domain? What would be the *best* way to model it? All of these questions have somewhat vague answers, because as a rule, there is no *one way* to model a graph. The data does not determine the model – it’s the QUERY that will drive the modelling decisions.

One of the things that spurred the discussion was – probably not coincidentally – the AlcoholPercentage. Many people were expecting that to be a *property* of the Beerbrand – but instead in my beergraph, I had “pulled it out”. The main reason at the time was more coincidence than anything else, but when you think of it – it’s actually a fantastic thing to “pull things out” and normalise the data model much further than you probably would in a relational model. By making the alcoholpercentage a node of its own, it allowed me to do more interesting queries and pathfinding operations – which led to interesting beer recommendations. Which is what this is all about, right?

(…)

When I read:

All of these questions have somewhat vague answers, because as a rule, there is no *one way* to model a graph. The data does not determine the model – it’s the QUERY that will drive the modelling decisions.

or

…but instead in my beergraph, I had “pulled it out”. The main reason at the time was more coincidence than anything else, but when you think of it – it’s actually a fantastic thing to “pull things out” and normalise the data model much further than you probably would in a relational model.

I don’t feel like I’ve been vague, ever. 😉

Here is my summary of what Rik may have meant:

  • “no *one way* to model a graph” -> graphs support multiple models of data
  • “The data does not determine the model ” -> may mean you can create any arbitrary model based on any data
  • “…the QUERY that will drive the modeling decisions.” -> in topic map terms, what gets represented by a topic (node in a graph) is what you want to talk about (query)
  • “…pulled it out…”/”…pull things out…” -> represent a subject with a node (graph) or topic (topic maps).
  • “…normlise the data model much further…” -> The distinction from database normalization isn’t clear, may just be filler.
    • Clarity in writing reduces unnecessary vagueness.

Graph Landscape Survey

Filed under: GraphBuilder,GraphLab,Graphs,Neo4j,Pregel,Spark — Patrick Durusau @ 9:41 am

Improving options for unlocking your graph data by Ben Lorica.

From the post:

The popular open source project GraphLab received a major boost early this week when a new company comprised of its founding developers, raised funding to develop analytic tools for graph data sets. GraphLab Inc. will continue to use the open source GraphLab to “push the limits of graph computation and develop new ideas”, but having a commercial company will accelerate development, and allow the hiring of resources dedicated to improving usability and documentation.

While social media placed graph data on the radar of many companies, similar data sets can be found in many domains including the life and health sciences, security, and financial services. Graph data is different enough that it necessitates special tools and techniques. Because tools were a bit too complex for casual users, in the past this meant graph data analytics was the province of specialists. Fortunately graph data is an area that has attracted many enthusiastic entrepreneurs and developers. The tools have improved and I expect things to get much easier for users in the future. A great place to learn more about tools for graph data, is at the upcoming GraphLab Workshop (on July 1st in SF).
(…)

Ben summarizes graph resources for:

  • Data wrangling: creating graphs
  • Data management and search
  • Graph-parallel frameworks
  • Machine-learning and analytics
  • Visualization

It would be hard to find a better starting place for investigating the buzz about graphs.

I first saw this in An Overview of Graph Processing Frameworks by Danny Bickson.

May 17, 2013

Knowledge Bases in Neo4j

Filed under: Graphs,Neo4j — Patrick Durusau @ 2:05 pm

Knowledge Bases in Neo4j by Max De Marzi.

From the post:

From the second we are born we are collecting a wealth of knowledge about the world. This knowledge is accumulated and interrelated inside our brains and it represents what we know. If we could export this knowledge and give it to a computer, it would look like ConceptNet. ConceptNet is a semantic network that…

…is built from nodes representing concepts, in the form of words or short phrases of natural language, and labeled relationships between them. These are the kinds of things computers need to know to search for information better, answer questions, and understand people’s goals.

I wrote a little ruby script to import ConceptNet5 into Neo4j and it gives us a nice graph (243MB) to work with. ConceptNet5 as presented in csv files is actually a hypergraph, with a reason for the concept:

Max gives a script to remove the reasons for concepts (the hypergraph part of ConceptNet5) as duplicate content.

It does make the graph smaller, but only at the expense of information loss.

Think of it as the Benghazi emails with all the duplicate prose removed along with who said it.

If that fits your requirements, ok, but I doubt it would fit in any environment that requires information auditing.

May 14, 2013

Labels and Schema Indexes in Neo4j

Filed under: Cypher,Indexing,Neo4j — Patrick Durusau @ 9:24 am

Labels and Schema Indexes in Neo4j by Tareq Abedrabbo.

From the post:

Neo4j recently introduced the concept of labels and their sidekick, schema indexes. Labels are a way of attaching one or more simple types to nodes (and relationships), while schema indexes allow to automatically index labelled nodes by one or more of their properties. Those indexes are then implicitly used by Cypher as secondary indexes and to infer the starting point(s) of a query.

I would like to shed some light in this blog post on how these new constructs work together. Some details will be inevitably specific to the current version of Neo4j and might change in the future but I still think it’s an interesting exercise.

Before we start though I need to populate the graph with some data. I’m more into cartoon for toddlers than second-rate sci-fi and therefore Peppa Pig shall be my universe.

So let’s create some labeled graph resources.

Nice review of the impact of the new label + schema index features in Neo4j.

I am still wondering why Neo4j “simple types” cannot be added to nodes and edges without the additional machinery of labels?

Allow users to declare properties to be indexed and used by Cypher for queries.

Which creates a generalized mechanism that requires no changes to the data model.

I have a question pending with the Neo4j team on this issue and will report back with their response.

May 10, 2013

Inserting data into Neo4j with Neo4j-Shell and Cypher

Filed under: Graphs,Neo4j — Patrick Durusau @ 4:54 pm

Inserting data into Neo4j with Neo4j-Shell and Cypher by Alireza Rezaei Mahdiraji.

Tip on how to insert large data sets into Neo4j using Cypher and the Neo4j shell.

Alireza comments:

For a file of 64M, I commit after each 500 node/relationship commands and it works just fine. I tried it with 1000 and I got the same error as above.

Comparable commit numbers from other graph databases?

April 30, 2013

XDGBench: 3rd party benchmark results against graph databases [some graph databases]

Filed under: AllegroGraph,Benchmarks,Fuseki,Neo4j,OrientDB — Patrick Durusau @ 2:19 pm

XDGBench: 3rd party benchmark results against graph databases by Luca Garulli.

From the post:

Toyotaro Suzumura and Miyuru Dayarathna from the Department of Computer Science of the Tokyo Institute of Technology and IBM Research published an interesting research about a benchmark between Graph Databases in the Clouds called:

XGDBench: A Benchmarking Platform for Graph Stores in Exascale Clouds”

This research conducts a performance evaluation of four famous graph data stores AllegroGraph, Fuseki, Neo4j, an OrientDB using XGDBench on Tsubame 2.0 HPC cloud environment. XGDBench is an extension of famous Yahoo! Cloud Serving Benchmark (YCSB).

OrientDB is the faster Graph Database among the 4 products tested. In particular OrientDB is about 10x faster (!) than Neo4j in all the tests.

Look at the Presentation (25 slides) and Research PDF.

Researchers are free to pick any software packages for comparison but the selection here struck me as odd before reading a comment on the original post asking for ObjectivityDB be added to the comparison.

For that matter, where are GraphChi, Infinite Graph, Dex, Titan, FlockDB? Just to call a few of the other potential candidates out.

Will be interesting when a non-winner on such a benchmark cites it for the proposition that easy of use, reliability, lower TOC outweighs brute speed in a benchmark test.

April 25, 2013

Gmail Email analysis with Neo4j – and spreadsheets

Filed under: Email,Neo4j — Patrick Durusau @ 10:36 am

Gmail Email analysis with Neo4j – and spreadsheets by Rik Van Bruggen.

From the post:

A bunch of different graphistas have pointed out to me in recent months that there is something funny about Graphs and email. Specifically, about graphs and email analysis. From my work in previous years at security companies, I know that Email Forensics is actually big business. Figuring out who emails whom, about what topics, with what frequency, at what times – is important. Especially when the proverbial sh*t hits the fan and fraud comes to light – like in the Enron case. How do I get insight into email traffic? How do I know what was communicated to who? And how do I get that insight, without spending a true fortune?

An important demonstration that sophisticated data analysis may originate with fairly pedestrian authoring tools.

For the Enron emails, see: Enron Email Dataset. Reported to be 0.5M messages, approximately 423Mb, tarred and gzipped.

The topic map question is what to do with separate graphs of:

  • Enron emails,
  • Enron corporate structure,
  • Social relationships between Enron employees and others,
  • Documents of other types interchanged or read inside of Enron,
  • Travel and expense records, and,
  • Phone logs inside Enron?

Graphs of any single data set can be interesting.

Merging graphs of inter-related data sets can be powerful.

April 20, 2013

Match Making with NEO4J

Filed under: Graphs,Neo4j — Patrick Durusau @ 10:03 am

Match Making with NEO4J by Max De Marzi.

From the post:

In the “Matches are the new Hotness” blog post, I showed how to connect a person to a job via a location and skills. We’re going to look at a variation on the theme today by matching people to other people by what they want in a potential mate. We’re gonna use Neo4j to bring the love.

There are a ton of opinions on what’s wrong with current dating sites. I don’t claim to know how to fix them, I’m just giving what may be a piece of the puzzle. We could try to match people on the things they have in common, but the saying “opposites attract” exists for a reason. We often don’t want mirrors of ourselves, but rather to supplement some perceived deficiency. However complete opposites may result in exciting relationships, but may not be long-lasting. Some kind of happy middle ground is probably best.

This should come with a warning:

Don’t try this at home!

😉

Romantic advice, even from close friends, is fraught with peril. The professionals are getting paid for the risk.

Still, graphing high school/college romantic relationships could interest young people in computing and graphs.

Max has a great pic at this post. I had forgotten how beautiful she was.

April 15, 2013

Almost There: Neo4j 1.9-RC1!

Filed under: Graphs,Neo4j — Patrick Durusau @ 7:31 pm

Almost There: Neo4j 1.9-RC1! by Philip Rathle.

From the post:

Today is Leonhard Euler’s birthday, and we’re celebrating by announcing a first Release Candidate for Neo4j 1.9, now available for download! This release includes a number of incremental changes from the last Milestone (1.9-M05). This release candidate includes the last set of features we’d love our community to try out, as we prepare Neo4j 1.9 for General Availability (GA).

Philip also reports changes since the last milestone (1.9-M05).

I’m curious if the final Neo4j 1.9 release is going to be benchmarked against earlier releases?

Or released with benchmarks at all?

April 13, 2013

Cypher: It doesn’t all start with the START (in Neo4j 2.0!) [Benchmarks?]

Filed under: Cypher,Graphs,Neo4j — Patrick Durusau @ 6:35 pm

Cypher: It doesn’t all start with the START (in Neo4j 2.0!)

From the post:

So, apparently, the Neo Technology guys read one of my last blog posts titled “It all starts with the START” and wanted to make a liar out of me. Actually, I’m quite certain it had nothing at all to do with that–they are just wanting to improve Cypher to make it the best graph query language out there. But yes, the START clause is now optional. “How do I tell Neo4j where to start my traversals”, you might ask. Well, in the long run, you won’t need to anymore. Neo4j will keep index and node/rel statistics and know which index to use, and know which start points to use to make the match and where the most efficient query based on its cost optimization. It’s not quite there yet, so for a while we’ll probably want to make generous use of “index hints”, but I love the direction this is going–feels just like the good old SQL.

While you are looking at Neo4j 2.0, remember the performance benchmarks by René Pickhardt up through Neo4j 1.9:

Get the full neo4j power by using the Core Java API for traversing your Graph data base instead of Cypher Query Language

As of Neo4j 1.7, the core Java API was a full order of magnitude faster than Cypher and up to Neo4j 1.9, the difference was even greater.

Has anyone run the benchmark against Neo4j 2.0?

Mongraph

Filed under: MongoDB,Mongraph,Neo4j — Patrick Durusau @ 6:00 pm

Mongraph

From the readme:

Mongraph combines documentstorage database with graph-database relationships by creating a corresponding node for each document.

Flies in the face of every app being the “universal” app orthodoxy but still worth watching.

April 12, 2013

Null Values in Neo4j

Filed under: Graphs,Neo4j — Patrick Durusau @ 1:04 pm

While researching the new labels feature in Neo4j 2.0.0-M01, I ran across the following statement in the documentation:

Note

null is not a valid property value. Nulls can instead be modeled by the absence of a key.

(Properties 3.3)

The question is asked in the comments:

Bryan Watson • a year ago −
What is implied by the absence of a key (“null”)?
(1) the key is relevant but the value is unknown? (spouse of a customer)

and answered:

Andrés Taylor Bryan Watson • a year ago −
In Neo4j, the absence of a key can mean all three options. Is that problematic in a particular concrete case, or are you wondering in the general case?

and,

Andrés Taylor Bryan Watson • a year ago −
You are correct. Neo4j is an unstructured database, which moves some of the responsibilities to the application. This is one of the things that the application has to take care of.

Problematic for topic maps because a role in an association may be known but the player of that role unknown.

Moreover, what happens if there are multiple “nulls?”

An application could have a “schema” for a node type that makes it possible to spot missing keys, but that seems like a long way to go for a “null.”

I don’t find “unstructured database” to be a persuasive argument for moving responsibilities to an application.

Databases, unstructured or otherwise, should be able to deal robustly with the various cases of “null.”

April 10, 2013

Neo4j in Action – Software Metrics [Correction]

Filed under: Graphs,Neo4j,Software — Patrick Durusau @ 1:38 pm

Neo4j in Action – Software Metrics by Michael Hunger.

Michael walks through exploring a Java class as a graph.

Makes me curious about treating code as a graph in order to discover which classes call the same data?

BTW, the tweeted location: http://www.slideshare.net/mobile/jexp/class-graph-neo4j-and-software-metrics does not appear to work in a desktop browser.

I was able to locate: http://www.slideshare.net/jexp/class-graph-neo4j-and-software-metrics, which is the link I use above.

April 8, 2013

Neo4j 2.0.0-M01

Filed under: Graphs,Neo4j — Patrick Durusau @ 3:57 pm

Nodes are people, too by Philip Rathle.

From the post:

Today we are releasing Milestone Release Neo4j 2.0.0-M01 of the Neo4j 2.0 series which we expect to be generally available (GA) in the next couple months. This release is significant in that it is the first time since the inception of Neo4j thirteen years ago that we are making a change to the property graph model. Specifically, we will be adding a new construct: labels.

We’ve completed a first cut at a significant addition to the data model: one that we believe nearly every graph will benefit from. Because this is a major change, it merits feedback, and we are opening the code up now for early comment. Please therefore consider 2.0 to be an experimental release. This first milestone is intended to solicit your input. In addition to the new technology being work-in-progress, some of the new terminology is also work-in-progress. We look forward to making 2.0 a better release together, with your feedback. Please tell us how you’d like to use these changes. We can’t wait to hear what you think.

Read the post for the introduction to “labels” for nodes.

Suggest you run 1.8/9 along side the experimental release 2.0.0-M01.

Something about the modeling of person by adding a property to the node strikes me as odd.

Rather than creating a “person” node with an edge to the original node.

Can’t put my finger on it but will be playing with it this week.

What other features would you like to see?

I’m thinking scope on properties would be high on my list.

Permission Resolution With Neo4j — Part 3

Filed under: Graphs,Neo4j — Patrick Durusau @ 3:22 pm

Permission Resolution With Ne4j — Part 3 by Max De Marzi.

From the post:

Let’s add a couple of performance tests to the mix. We learned about Gatling in a previous blog post, we’re going to use it here again. The first test will randomly choose users and documents (from the graph we created in part 2) and write the results to a file, the second test will re-use the results of the first one and run consistently so we can change hardware, change Neo4j parameters, tune the JVM, etc. and see how they affect our performance.

Interesting post on testing a graph database to the point of:

How well (lousy) would this fair on a relational database?

A 10 Million row table joined to itself 100 times…omgwtfbbq.

Maybe we should ask Facebook?

Facebook releases Linkbench MySQL benchmark

In case you don’t know, MySQL is a relational database.

Last I heard the Facebook graph is one (1) billion users+.

Relational database technology is managing their graph fairly well.

What do you think?

« Newer PostsOlder Posts »

Powered by WordPress