Archive for the ‘Cypher’ Category

Neo4j 3.3.0-alpha02 (Graphs For Schemas?)

Friday, June 30th, 2017

Neo4j 3.3.0-alpha02

A bit late (release was 06/15/2017) but give Neo4j 3.3.0-alpha02 a spin over the weekend.

From the post:

Detailed Changes and Docs

For the complete list of all changes, please see the changelog. Look for 3.3 Developer manual here, and 3.3 Operations manual here.

Neo4j is one of the graph engines a friend wants to use for analysis/modeling of the ODF 1.2 schema. The traditional indented list is only one tree visualization out of the four major ones.

(From: Trees & Graphs by Nathalie Henry Riche, Microsoft Research)

Riche’s presentation covers a number of other ways to visualize trees and if you relax the “tree” requirement for display, interesting graph visualizations that may give insight into a schema design.

The slides are part of the materials for CSE512 Data Visualization (Winter 2014), so references for visualizing trees and graphs need to be updated. Check the course resources link for more visualization resources.

Guesstimating the Future

Thursday, September 24th, 2015

I ran across some introductory slides on Neo4j with the line:

Forrester estimates that over 25% of enterprises will be using graph databases by 2017.

Well, Forrester also predicted that tablet sales would over take laptops sales in 2015: Forrester: Tablet Sales Will Eclipse Laptop Sales by 2015.

You might want to check that prediction against: Laptop sales ‘stronger than ever’ versus tablets – PCR Retail Advisory Board.

The adage “It is difficult to make predictions, especially about the future.,” remains appropriate.

Neo4j doesn’t need lemming-like behavior among consumers of technology to make a case for itself.

Compare Neo4j and its query language, Cypher, to your use cases and I think you will agree.

Spreadsheets are graphs too!

Wednesday, August 26th, 2015

Spreadsheets are graphs too! by Felienne Hermans.

Presentation with transcript.

Felienne starts with a great spreadsheet story:

When I was in grad school, I worked with an investment bank doing spreadsheet research. On my first day, I went to the head of the Excel team.

I said, ‘Hello, can I have a list of all your spreadsheets?’

There was no such thing.

‘We don’t have a list of all the spreadsheets,’ he said. ‘You could ask Frank in Accounting or maybe Harry over at Finance. He’s always talking about spreadsheets. I don’t really know, but I think we might have 10,000 spreadsheets.’

10,000 spreadsheets was a gold mine of research, so I went to the IT department and conducted my first spreadsheet scan with root access in Windows Explorer.

Within one second, it had already found 10,000 spreadsheets. Within an hour, it was still finding more, with over one million Excel files located. Eventually, we found 2.5 million spreadsheets.

In short, spreadsheets run the world.

She continues to outline spreadsheet horror stories and then demonstrates how complex relationships between cells can be captured by Neo4j.

Which are much easier to query with Cypher than SQL!

While I applaud:

I realized that spreadsheet information is actually very graphy. All the cells are connected to references to each other and they happen to be in a worksheet or on the spreadsheet, but that’s not really what matters. What matters is the connections.

I would be more concerned with the identity of the subjects between which connections have been made.

Think of it as documenting the column headers from a five year old spreadsheet, that you are now using by rote.

Knowing the connections between cells is a big step forward. Knowing what the cells are supposed to represent is an even bigger one.

Content Recommendation From Links Shared on Twitter Using Neo4j and Python

Thursday, May 28th, 2015

Content Recommendation From Links Shared on Twitter Using Neo4j and Python by William Lyon.

From the post:


I’ve spent some time thinking about generating personalized recommendations for articles since I began working on an iOS reading companion for the bookmarking service. One of the features I want to provide is a feed of recommended articles for my users based on articles they’ve saved and read. In this tutorial we will look at how to implement a similar feature: how to recommend articles for users based on articles they’ve shared on Twitter.


The main tools we will use are Python and Neo4j, a graph database. We will use Python for fetching the data from Twitter, extracting keywords from the articles shared and for inserting the data into Neo4j. To find recommendations we will use Cypher, the Neo4j query language.

Very clear and complete!


Detecting potential typos using EXPLAIN

Thursday, March 19th, 2015

Detecting potential typos using EXPLAIN by Mark Needham.

Mark illustrates use of EXPLAIN (in Neo4j 2.2.0 RC1) to detect typos (not potential, actual typos) to debug a query.

Now if I could just find a way to incorporate EXPLAIN into documentation prose.

PS: I say that in jest but using a graph model, it should be possible to create a path through documentation that highlights the context of a particular point in the documentation. Trivial example: I find “setting margin size” but don’t know how that relates to menus in an application. “Explain” in that context displays a graph with the nodes necessary to guide me to other parts of the documentation. Each of those nodes might have additional information at each of their “contexts.”

Querying Graphs with Neo4j [cheatsheet]

Saturday, November 1st, 2014

Querying Graphs with Neo4j by Michael Hunger.

Download the refcard by usual process, login into Dzone, etc.

When you open the PDF file in a viewer, do be careful. (Page references are to the DZone cheatsheet.)

Cover The entire cover is a download link. Touch it at all and you will be taken to a download link for Neo4j.

Page 1 covers “What is a Graph Database?” and “What is Neo4j?,” just in case you have been forced by home invaders to download a refcard for a technology you know nothing about.

Page 2 pitches the Neo4j server and then Getting Started with Neo4j, perhaps to annoy the NSA with repetitive content.

The DZone cheatsheet replicates the cheatsheet at:, with the following changes:

Page 3


Re-written. Old version:

MATCH (user)-[:FRIEND]-(friend) WHERE = {name} WITH user, count(friend) AS friends WHERE friends > 10 RETURN user

The WITH syntax is similar to RETURN. It separates query parts explicitly, allowing you to declare which identifiers to carry over to the next part.

MATCH (user)-[:FRIEND]-(friend) WITH user, count(friend) AS friends ORDER BY friends DESC SKIP 1 LIMIT 3 RETURN user

You can also use ORDER BY, SKIP, LIMIT with WITH.

New version:

MATCH (user)-[:KNOWS]-(friend) WHERE = {name} WITH user, count(*) AS friends WHERE friends > 10 RETURN user

WITH chains query parts. It allows you to specify which projection of your data is available after WITH.

ou can also use ORDER BY, SKIP, LIMIT and aggregation with WITH. You might have to alias expressions to give them a name.

I leave it to your judgement which version was the clearer.

Page 4

MERGEinserts: typo “{name: {value3}} )” on last line of final example under MERGE.

SETinserts: “SET n += {map} Add and update properties, while keeping existing ones.”

INDEXinserts: “MATCH (n:Person) WHERE IN {values} An index can be automatically used for the IN collection checks.”

Page 5


changes: “(n)-[*1..5]->(m) Variable length paths.” to “(n)-[*1..5]->(m) Variable length paths can span 1 to 5 hops.”

changes: “(n)-[*]->(m) Any depth. See the performance tips.” to “(n)-[*]->(m) Variable length path of any depth. See performance tips.”

changes: “shortestPath((n1:Person)-[*..6]-(n2:Person)) Find a single shortest path.” to “shortestPath((n1)-[*..6]-(n2))”


changes: “range({first_num},{last_num},{step}) AS coll Range creates a collection of numbers (step is optional), other functions returning collections are: labels, nodes, relationships, rels, filter, extract.” to “range({from},{to},{step}) AS coll Range creates a collection of numbers (step is optional).” [Loss of information from the earlier version.]

inserts: “UNWIND {names} AS name MATCH (n:Person {name:name}) RETURN avg(n.age) With UNWIND, you can transform any collection back into individual rows. The example matches all names from a list of names.”


inserts: “range({start},{end},{step}) AS coll Range creates a collection of numbers (step is optional).”

Page 6


changes: “NOT (n)-[:KNOWS]->(m) Exclude matches to (n)-[:KNOWS]->(m) from the result.” to “NOT (n)-[:KNOWS]->(m) Make sure the pattern has at least one match.” [Older version more precise?]

replaces: mixed case, true/TRUE with TRUE


inserts: “toInt({expr}) Converts the given input in an integer if possible; otherwise it returns NULL.”

inserts: “toFloat({expr}) Converts the given input in a floating point number if possible; otherwise it returns NULL.”


changes: “MATCH path = (begin) -[*]-> (end) FOREACH (n IN rels(path) | SET n.marked = TRUE) Execute a mutating operation for each relationship of a path.” to “MATCH path = (begin) -[*]-> (end) FOREACH (n IN rels(path) | SET n.marked = TRUE) Execute an update operation for each relationship of a path.”


changes: “FOREACH (value IN coll | CREATE (:Person {name:value})) Execute a mutating operation for each element in a collection.” to “FOREACH (value IN coll | CREATE (:Person {name:value})) Execute an update operation for each element in a collection.”


changes: degrees({expr}), radians({expr}), pi() Converts radians into degrees, use radians for the reverse. pi for π.” to “degrees({expr}), radians({expr}), pi() to Converts radians into degrees, use radians for the reverse.” Loses “pi for π.”

changes: “log10({expr}), log({expr}), exp({expr}), e() Logarithm base 10, natural logarithm, e to the power of the parameter. Value of e.” to “log10({expr}), log({expr}), exp({expr}), e() Logarithm base 10, natural logarithm, e to the power of the parameter.” Loses “Value of e.”

Page 7


inserts: “split({string}, {delim}) Split a string into a collection of strings.”

AGGREGATION changes: collect( Collection from the values, ignores NULL. to “collect( Value collection, ignores NULL.”


remove: “START n=node(*) Start from all nodes.”

remove: “START n=node({ids}) Start from one or more nodes specified by id.”

remove: “START n=node({id1}), m=node({id2}) Multiple starting points.”

remove: “START n=node:nodeIndexName(key={value}) Query the index with an exact query. Use node_auto_index for the automatic index.”

inserts: “START n = node:indexName(key={value}) n=node:nodeIndexName(key={value}) n=node:nodeIndexName(key={value}) Query the index with an exact query. Use node_auto_index for the old automatic index.”

inserts: ‘START n = node:indexName({query}) Query the index by passing the query string directly, can be used with lucene or spatial syntax. E.g.: “name:Jo*” or “withinDistance:[60,15,100]”‘

I may have missed some changes because as you know, the “cheatsheets” for Cypher have no particular order for the entries. Alphabetical order suggests itself for future editions, sans the marketing materials.

Changes to a query language should appear where a user would expect to find the command in question. For example, the “CREATE a={property:’value’} has been removed” should appear where expected on the cheatsheet, noting the change. Users should not have to hunt high and low for “CREATE a={property:’value’}” on a cheatsheet.

I have passed over incorrect use of the definite article and other problems without comment.

Despite the shortcomings of the DZone refcard, I suggest that you upgrade to it.

How To Create Semantic Confusion

Thursday, July 31st, 2014

Merge: to cause (two or more things, such as two companies) to come together and become one thing : to join or unite (one thing) with another (

Do you see anything common between that definition of merge and:

  • It ensures that a pattern exists in the graph by creating it if it does not exist already
  • It will not use partially existing (unbound) patterns- it will attempt to match the entire pattern and create the entire pattern if missing
  • When unique constraints are defined, MERGE expects to find at most one node that matches the pattern
  • It also allows you to define what should happen based on whether data was created or matched

The quote is from Cypher MERGE Explained by Luanne Misquitta. Great post if you want to understand the operation of Cypher “merge,” which has nothing in common with the term “merge” in English.

Want to create semantic confusion?

Choose a well-known term and define new and unrelated semantics for it. Creates a demand for training, tutorials as well as confused users.

I first saw this in a tweet by GraphAware.

Neo4j’s Cypher vs Clojure – Group by and Sorting

Friday, July 11th, 2014

Neo4j’s Cypher vs Clojure – Group by and Sorting by Mark Needham.

From the post:

One of the points that I emphasised during my talk on building Neo4j backed applications using Clojure last week is understanding when to use Cypher to solve a problem and when to use the programming language.

A good example of this is in the meetup application I’ve been working on. I have a collection of events and want to display past events in descending order and future events in ascending order.

Mark falls back on Clojure to cure the lack of sorting within a collection in Cypher.

The Zen of Cypher

Sunday, July 6th, 2014

The Zen of Cypher by Nigel Small.

The original “Zen” book, Zen and the art of motorcycle maintenance: an inquiry into values runs four hundred and eighteen pages.

Nigel has a useful summary for Cypher but I would estimate it runs about a page.

Not really the in depth sort of treatment that qualifies for a “Zen” title.


Neo4j 2.0: Creating adjacency matrices

Friday, May 23rd, 2014

Neo4j 2.0: Creating adjacency matrices by Mark Needham.

From the post:

About 9 months ago I wrote a blog post showing how to export an adjacency matrix from a Neo4j 1.9 database using the cypher query language and I thought it deserves an update to use 2.0 syntax.

I’ve been spending some of my free time working on an application that runs on top of’s API and one of the queries I wanted to write was to find the common members between 2 meetup groups.

The first part of this query is a cartesian product of the groups we want to consider which will give us the combinations of pairs of groups:

I can imagine several interesting uses for the adjacency matrices that Mark describes.

One of which is common membership in groups as the post outlines.

Another would be a common property or sharing a value within a range.


Facebook Graph Search with Cypher and Neo4j

Monday, March 17th, 2014

Facebook Graph Search with Cypher and Neo4j by Max De Marzi.

A great post as always but it has just been updated:

Update: Facebook has disabled this application

Your app is replicating core Facebook functionality.

Rather ironic considering this headline:

Mark Zuckerberg called Obama about the NSA. Let’s not hang up the phone by Dan Gillmor.

It’s hard to say why Mark is so upset.

Here are some possible reasons:

  • NSA surveillance is poaching on surveillance sales by Facebook
  • NSA leaks exposed surveillance by Facebook
  • NSA leaks exposed U.S. corporations doing surveillance for the government
  • NSA surveillance will make consumers leery of Facebook surveillance
  • NSA leaks make everyone more aware of surveillance
  • NSA leaks make Mark waste time on phone with Obama acting indignant.

I am sure I have missed dozens of reasons why Mark is upset.

Care to supply the ones I missed?

Optimizing Cypher Queries in Neo4j

Wednesday, January 22nd, 2014

Optimizing Cypher Queries in Neo4j by Mark Needham and Wes Freeman.

Thursday January 23 10:00 PST / 19:00 CET


Mark and Wes will talk about Cypher optimization techniques based on real queries as well as the theoretical underlying processes. They’ll start from the basics of “what not to do”, and how to take advantage of indexes, and continue to the subtle ways of ordering MATCH/WHERE/WITH clauses for optimal performance as of the 2.0.0 release.

OK, I’m registered. But this is at 7 AM on the East Coast of the US. I will bring my own coffee but have high expectations. Just saying. 😉

Correction: East Coast today at 1:00 P.M. local. I’m not a very good clock. 😉

Generate Cypher Queries with R

Saturday, January 4th, 2014

Generate Cypher Queries with R by Nicole White.

From the post:

Lately I have been using R to generate Cypher queries and dump them line-by-line to a text file. I accomplish this through the sink(), cat(), and paste() functions. The sink function makes it so any console output is sent to the given text file; the cat function prints things to the console; and the paste function concatenates strings.

For my movie recommendations graph gist, I generated my Cypher queries by looping through the CSV file containing the movie ratings and concatenating strings as appropriate. The CSV was first loaded into a data frame called data, of which a snippet is shown below:

You do remember that the Neo4j GraphGist December Challenge ends January 31st, 2014? Yes?

Auto-generation will help avoid careless key stroke errors.

And serve to scale up from gist size to something more challenging.

Neo4j – Labels and Regression

Monday, December 16th, 2013

Yes, I am using labels in Neo4j but only because I am the only user of this data set. If I paint myself into a semantic corner, it will be my fault and not poor design.

In any event, I ran into an odd limitation on labels that may be of general interest.

My script was dying because my label read: “expert-Validation.”

Thinking the Neo4j documentation should have the answer, I consulted:

3.4.1 Label names:

Any non-empty unicode string can be used as a label name. In Cypher, you may need to use the backtick (`) syntax to avoid clashes with Cypher identifier rules. By convention, labels are written with CamelCase notation, with the first letter in upper case. For instance, User or CarOwner.

OK, so that’s encouraging, maybe I have run afoul of mathematical syntax or something.

Welllll, not quite.

8.3 Identifiers (under Cypher):

Identifier names are case sensitive, and can contain underscores and alphanumeric characters (a-z, 0-9), but must start with a letter. If other characters are needed, you can quote the identifier using backquote (`) signs.

The same rules apply to property names.

Sherman, set the WayBack Machine for 1986, we want to watch Charles Goldfarb write the name character provisions of ISO 8879:1986:

4.173 lower-case letters: Charcter class composed of the 26 unaccented small letters from “a” through “z”.

4.326 upper-case letters: Character class composed of the 26 capital letters from “A” through “Z”.

4.175 lower-case name start characters: Character class consisting of each lower-case name start character assigned by the concrete reference syntax.

4.328 upper-case name start characters: Character class consisting of upper-case forms of the corresponding lower-case name start characters.

4.94 digits: Character class composed of the 10 Arabic numerals from “0” to “9”.

4.174 lower-case name characters: Character class consisting of each lower-case name character assigned by the concrete reference syntax.

4.327 upper-case name start characters: Character class consisting of upper-case forms of the corresponding lower-case name characters.

I had the honor of knowing many of the contributors to the SGML standard, including its author, Charles Goldfarb.

But that was 1986. The Unicode project formally started two years later.

Over twenty-eight years after the SGML standard we have returned to name start characters and name characters (those not escaped by a “backtick”).

Is Unicode support really that uncommon in graph databases?

Neo4j Cypher Refcard 2.0

Wednesday, December 11th, 2013

Neo4j Cypher Refcard 2.0

From the webpage:

Key principles and capabilities of Cypher are as follows:

  • Cypher matches patterns of nodes and relationship in the graph, to extract information or modify the data.
  • Cypher has the concept of identifiers which denote named, bound elements and parameters.
  • Cypher can create, update, and remove nodes, relationships, labels, and properties.
  • Cypher manages indexes and constraints.

You can try Cypher snippets live in the Neo4j Console at or read the full Cypher documentation at For live graph models using Cypher check out GraphGist.

If you plan on entering the Neo4j GraphGist December Challenge, you are probably going to need this Refcard.

I first saw this in a tweet by Peter Neubauer.

Geoff (update)

Thursday, December 5th, 2013


My prior post on Geoff pointed to a page about Geoff that appears to no longer exist. I have updated that page to point to the new location.

The current description reads:

Geoff is a text-based interchange format for Neo4j graph data that should be instantly readable to anyone familiar with Cypher, on which its syntax is based.

Quick Start with Neo4J…

Thursday, November 28th, 2013

Quick Start with Neo4J using YOUR Twitter Data by John Berryman.

From the post:

When learning a new technology it’s best to have a toy problem in mind so that you’re not just reimplementing another glorified “Hello World” project. Also, if you need lots of data, it’s best to pull in a fun data set that you already have some familiarity with. This allows you to lean upon already established intuition of the data set so that you can more quickly make use of the technology. (And as an aside, this just why we so regularly use the StackExchange SciFi data set when presenting our new ideas about Solr.)

When approaching a graph database technology like Neo4J, if you’re as avid of a Twitter user as I am then POOF you already have the best possible data set for becoming familiar with the technology — your own Social network. And this blog post will help you download and setup Neo4J, set up a Twitter app (needed to access the Twitter API), pull down your social network as well as any other social network you might be interested in. At that point we’ll interrogate the network using the Neo4J and the Cypher syntax. Let’s go!

What? Not a single mention of Euler, bridges, claims about graphs rather that Atlas holding up the celestial sphere! Could this really be about Neo4j?

In a word: Yes!

In fact, it is one of the better introductions to Neo4j I have ever seen.

I like historical material but when you have seen dozens if not hundreds of presentations/slides repeating the same basic information, you start to worry about there being a Power-Point Slide Shortage. 😉

No danger of that with John’s post!

Following the instructions took a while in my case, mostly because I was called away to cook a pork loin (it’s a holiday here), plus rolls, etc., right as I got the authentication tokens. -( Then I had issues with a prior version of Neo4j that was already running. I installed via an installer and it had written a start script in rc4.d.

The latest version conflicts with the running older version and refuses to start without any meaningful error message. But, ps -ef | grep neo4j found the problem. Renaming the script while root, etc., fixed it. Do need to delete the older version at some point.

After all that, it was a piece of cake. John’s script works as promised.

I don’t know how to break this to John but now he is following but not being followed by neo4j, peterneubauer (Neo4j hotshot), and markhneedham (Neo4j hotshot). (As of 28 Nov. 2013, your results may vary.)

On the use of labels, you may be interested in the discussion at: RFC Blueprints3 and Vertex.getLabel()

Strings as labels leads to conflicts between labels with the same strings but different semantics.

If you are happy with a modest graph or are willing to police the use of labels it may work for you. On the other hand, it may not.

PS: I am over 11,500 nodes at this point and counting.

Pragmatic Cypher Optimization (2.0 M06)

Friday, November 8th, 2013

Pragmatic Cypher Optimization (2.0 M06)

From the post:

I’ve seen a few stack overflow and google group questions about queries that are slow, and I think there are some things that need to be said regarding Cypher optimization. These techniques are a few ways of improving your queries that aren’t necessarily intuitive. Before reading this, you should have an understanding of WITH (see my other post: The Mythical With).

First, let me throw out a nice disclaimer that these rules of thumb I’ve discovered are by no means definitively best practices, and you should measure your own results with cold and warm caches, running queries 3+ times to see realistic results with a warm cache.

Second, let me throw out another disclaimer, that Cypher is improving rapidly, and that these rules of thumb may only be valid for a few milestone releases. I’ll try to make future updates, but I’m sure there’s always danger of becoming out of date.

Ok, let’s get to it.

If you are looking for faster Cypher query results (who isn’t?), this is a good starting place for you!

Musicbrainz in Neo4j – Part 1

Thursday, November 7th, 2013

Musicbrainz in Neo4j – Part 1 by Paul Tremberth.

From the post:

What is MusicBrainz?

Quoting Wikipedia, MusicBrainz is an “open content music database [that] was founded in response to the restrictions placed on the CDDB.(…) MusicBrainz captures information about artists, their recorded works, and the relationships between them.”

Anyone can browse the database at If you create an account with them you can contribute new data or fix existing records details, track lengths, send in cover art scans of your favorite albums etc. Edits are peer reviewed, and any member can vote up or down. There are a lot of similarities with Wikipedia.

With this first post, we want to show you how to import the Musicbrainz data into Neo4j for some further analysis with Cypher in the second post. See below for what we will end up with:

MusicBrainz data

MusicBrainz currently has around 1000 active users, nearly 800,000 artists, 75,000 record labels, around 1,200,000 releases, more than 12,000,000 tracks, and short under 2,000,000 URLs for these entities (Wikipedia pages, official homepages, YouTube channels etc.) Daily fixes by the community makes their data probably the freshest and most accurate on the web.
You can check the current numbers here and here.

This rocks!

Interesting data, walk through how to load the data into Neo4j and the promise of more interesting activities to follow.

However, I urge caution on showing this to family members. 😉

You may wind up scripting daily data updates and teaching Cypher to family members and no doubt their friends.

Up to you.

I first saw this in a tweet by Peter Neubauer.

Neo4j 2.0.0-M06 – Introducing Neo4j’s Browser

Thursday, October 17th, 2013

Neo4j 2.0.0-M06 – Introducing Neo4j’s Browser by Andreas Kollegger.

From the post:

Type in a Cypher query, hit , then watch a graph visualization unfold. Want some data? Switch to the table view and download as CSV. Neo4j’s new Browser interface is a fluid developer experience, with iterative query authoring and graph visualization.

Available today in Neo4j 2.0.0 Milestone 6, download now to try out this shiny new user interface.

Like the man said: Download now! 😉

Andreas also suggests:

Ask questions on Stack Overflow.

Discuss ideas on our Google Group



Friday, October 4th, 2013


From the webpage:

Emacs major mode for editing cypher scripts (Neo4j).

First *.el upload today. Could be interesting.

XML to Cypher Converter/Geoff Converter

Monday, September 23rd, 2013

XML to Cypher Converter

From the webpage:

This service allows conversion of generic XML data into a Cypher CREATE statement, which can then be loaded into Neo4j.


XML to Geoff Converter

From the webpage:

This service allows conversion of generic XML data into a Geoff interchange file, which can then be loaded into Neo4j.

Both services can be used as a web service, in addition to supporting the pasting in of XML in a form.

You will also want to visit Nigel Small’s Github page and his

While poking around I also found:

XML to Graph Converter

XML data can easily be converted into a graph. Simply load paste the XML data into the left-hand side, convert into both Geoff and a Cypher CREATE statement, then view the results in the Neo4j console.

Definitely worth a deep look later this week with XML schemas.

Neo4j Cypher Refcard 2.0

Thursday, August 29th, 2013

Neo4j Cypher Refcard 2.0

This looks very useful.

If nobody else does, I will cast this into a traditional refcard format.

Cypher shell with logging

Friday, August 23rd, 2013

Cypher shell with logging by Alex Frieden.

From the post:

For those who don’t know, Neo4j is a graph database built with Java. The internet is abound with examples, so I won’t bore you with any.

Our problem was a data access problem. We built a loader, loaded our data into neo4j, and then queried it. However we ran into a little problem: Neo4j at the time of release logs in the home directory (at least on linux redhat) what query was ran (its there as a hidden file). However, it doesn’t log what time it was run at. One other problem as an administrator point of view is not having a complete log of all queries and data access. So we built a cypher shell that would do the logging the way we needed to log. Future iterations of this shell will have REST cypher queries and not use the embedded mode (which is faster but requires a local connection to the data). We also wanted a way in the future to output results to a file.


Logs are a form of documentation. You may remember that documentation was #1 in the Solr Usability contest.

Documentation is important! Don’t neglect it.

Are You Tracking Emails?

Friday, June 28th, 2013

neo4j/cypher: Aggregating relationships within a path by Mark Needham.

From the post:

I recently came across an interesting use case of paths in a graph where we wanted to calculate the frequency of communication between two people by showing how frequently each emailed the other.

The model looked like this:

email graph

I can’t imagine why Mark would think about tracking emails between people. 😉

And as Mark says, the query he settles on isn’t guaranteed to scale.

Still, it is an interesting exercise.

Whose Afraid of the NSA?

Thursday, June 6th, 2013

To Catch a Cyber-Thief

From the post:

When local police came calling with child porn allegations last January, former Saint John city councillor Donnie Snook fled his house clutching a laptop. It was clear that the computer contained damning data. Six months later, police have finally gathered enough evidence to land him in jail for a long time to come.

With a case seemingly so cut and dry, why the lag time? Couldn’t the police do a simple search for the incriminating info and level charges ASAP? Easier said than done. With computing devices storing terrabytes of personal data, it can take months before enough evidence can be cobbled together from reams of documents, emails, chat logs and text messages.

That’s all about to change thanks to a new technique developed by researchers at Concordia University, who have slashed the data-crunching time. What once took months now takes minutes.

Gaby Dagher and Benjamin Fung, researchers with the Concordia Institute for Information Systems Engineering, will soon publish their findings in Data & Knowledge Engineering. Law enforcement officers are already putting this research to work through Concordia’s partnership with Canada’s National Cyber-Forensics and Training Alliance, in which law enforcement organizations, private companies, and academic institutions work together to share information to stop emerging cyber threats and mitigate existing ones.

Thanks to Dagher and Fung, crime investigators can now extract hidden knowledge from a large volume of text. The researchers’ new methods automatically identify the criminal topics discussed in the textual conversation, show which participants are most active with respect to the identified criminal topics, and then provide a visualization of the social networks among the participants.

Dagher, who is a PhD candidate supervised by Fung, explains “the huge increase in cybercrimes over the past decade boosted demand for special forensic tools that let investigators look for evidence on a suspect’s computer by analyzing stored text. Our new technique allows an investigator to cluster documents by producing overlapping groups, each corresponding to a specific subject defined by the investigator.”

Have you heard about clustering documents? Searching large volumes of text? Producing visualizations of social networks?

The threat of government snooping on its citizens should be evaluated on its demonstrated competence.

The FBI wants special backdoors (like it has for telecommunications) just to monitor IP traffic. (Going Bright… [Hack Shopping Mall?])

It would help the FBI if they had our secret PGP keys.

There a thought, maybe we should all generate new PGP keys and send the secret key for that key to the FBI.

They may not ever intercept any traffic encrypted with those keys but they can get funding from Congress to maintain an archive of them and to run them against all IP traffic.

The NSA probably has better chops when it comes to data collection but identity mining?

Identity mining is something completely different.

(See: The NSA Verizon Collection Coming on DVD)

AnormCypher 0.4.1 released!

Sunday, May 19th, 2013

AnormCypher 0.4.1 released! by Wes Freeman.

From the post:

Thanks to Pieter, AnormCypher 0.4.1 supports versions earlier than Neo4j 1.9 (I didn’t realize this was an issue).

AnormCypher is a Cypher-oriented Scala library for Neo4j Server (REST). The goal is to provide a great API for calling arbitrary Cypher and parsing out results, with an API inspired by Anorm from the Play! Framework.

If you are working with a Neo4j Server this may be of interest.

Labels and Schema Indexes in Neo4j

Tuesday, May 14th, 2013

Labels and Schema Indexes in Neo4j by Tareq Abedrabbo.

From the post:

Neo4j recently introduced the concept of labels and their sidekick, schema indexes. Labels are a way of attaching one or more simple types to nodes (and relationships), while schema indexes allow to automatically index labelled nodes by one or more of their properties. Those indexes are then implicitly used by Cypher as secondary indexes and to infer the starting point(s) of a query.

I would like to shed some light in this blog post on how these new constructs work together. Some details will be inevitably specific to the current version of Neo4j and might change in the future but I still think it’s an interesting exercise.

Before we start though I need to populate the graph with some data. I’m more into cartoon for toddlers than second-rate sci-fi and therefore Peppa Pig shall be my universe.

So let’s create some labeled graph resources.

Nice review of the impact of the new label + schema index features in Neo4j.

I am still wondering why Neo4j “simple types” cannot be added to nodes and edges without the additional machinery of labels?

Allow users to declare properties to be indexed and used by Cypher for queries.

Which creates a generalized mechanism that requires no changes to the data model.

I have a question pending with the Neo4j team on this issue and will report back with their response.

Cypher: It doesn’t all start with the START (in Neo4j 2.0!) [Benchmarks?]

Saturday, April 13th, 2013

Cypher: It doesn’t all start with the START (in Neo4j 2.0!)

From the post:

So, apparently, the Neo Technology guys read one of my last blog posts titled “It all starts with the START” and wanted to make a liar out of me. Actually, I’m quite certain it had nothing at all to do with that–they are just wanting to improve Cypher to make it the best graph query language out there. But yes, the START clause is now optional. “How do I tell Neo4j where to start my traversals”, you might ask. Well, in the long run, you won’t need to anymore. Neo4j will keep index and node/rel statistics and know which index to use, and know which start points to use to make the match and where the most efficient query based on its cost optimization. It’s not quite there yet, so for a while we’ll probably want to make generous use of “index hints”, but I love the direction this is going–feels just like the good old SQL.

While you are looking at Neo4j 2.0, remember the performance benchmarks by René Pickhardt up through Neo4j 1.9:

Get the full neo4j power by using the Core Java API for traversing your Graph data base instead of Cypher Query Language

As of Neo4j 1.7, the core Java API was a full order of magnitude faster than Cypher and up to Neo4j 1.9, the difference was even greater.

Has anyone run the benchmark against Neo4j 2.0?

Cypher in Neo4j 2.0

Sunday, March 24th, 2013

Cypher in Neo4j 2.0

Previews new features in Neo4j.

Labels & Indexing

Labels group nodes into sets. Nodes can have multiple labels.

Can use labels to create indexes on subsets of nodes.

Labels will support schema constraints (future feature).

I first saw this in a tweet by Michael Lappe.