Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 27, 2015

Making Master Data Management Fun with Neo4j – Part 1, 2

Filed under: Graphs,Neo4j — Patrick Durusau @ 3:29 pm

Making Master Data Management Fun with Neo4j – Part 1 by Brian Underwood.

From Part 1:

Joining multiple disparate data-sources, commonly dubbed Master Data Management (MDM), is usually not a fun exercise. I would like to show you how to use a graph database (Neo4j) and an interesting dataset (developer-oriented collaboration sites) to make MDM an enjoyable experience. This approach will allow you to quickly and sensibly merge data from different sources into a consistent picture and query across the data efficiently to answer your most pressing questions.

To start I’ll just be importing one data source: StackOverflow questions tagged with neo4j and their answers. In future blog posts I will discuss how to integrate other data sources into a single graph database to provide a richer view of the world of Neo4j developers’ online social interactions.

I’ve created a GraphGist to explore questions about the imported data, but in this post I’d like to briefly discuss the process of getting data from StackOverflow into Neo4j.

Part 1 imports data from Stackover flow into Neoj.

Making Master Data Management Fun with Neo4j – Part 2 imports Github data:

All together I was able to import:

  • 6,337 repositories
  • 6,232 users
  • 11,011 issues
  • 474 commits
  • 22,676 comments

In my next post I’ll show the process of how I linked the orignal StackOveflow data with the new GitHub data. Stay tuned for that, but in the meantime I’d also like to share the more technical details of what I did for those who are interested.

Definitely looking forward to seeing the reconciliation of data between StackOverflow and GitHub.

February 22, 2015

Neo4j: Building a topic graph with Prismatic Interest Graph API

Filed under: Graphs,Natural Language Processing,Neo4j,Topic Models (LDA) — Patrick Durusau @ 4:44 pm

Neo4j: Building a topic graph with Prismatic Interest Graph API by Mark Needham.

From the post:

Over the last few weeks I’ve been using various NLP libraries to derive topics for my corpus of How I met your mother episodes without success and was therefore enthused to see the release of Prismatic’s Interest Graph API.

The Interest Graph API exposes a web service to which you feed a block of text and get back a set of topics and associated score.

It has been trained over the last few years with millions of articles that people share on their social media accounts and in my experience using Prismatic the topics have been very useful for finding new material to read.

A great walk through from accessing the Interest Graph API to loading the data into Neo4j and querying it with Cypher.

I can’t profess a lot of interest in How I Met Your Mother episodes but the techniques can be applied to other content. 😉

February 2, 2015

Neo4j 2.2 Milestone 3 Release

Filed under: Graphs,Neo4j — Patrick Durusau @ 7:47 pm

Highlights of Neo4j 2.2 release:

From the post:

Three of the key areas being tackled in this release are:

      1. Highly Concurrent Performance

      With Neo4j 2.2, we introduce a brand new page cache designed to deliver extreme performance and scalability under highly concurrent workloads. This new page cache helps overcome the limitations imposed by the current IO systems to support larger applications with hundreds of read and/or write IO requirements. The new cache is auto-configured and matches the available memory without the need to tune memory mapped IO settings anymore.

      2. Transactional & Batch Write Performance

      We have made several enhancements in Neo4j 2.2 to improve both transactional and batch write performance by orders of magnitude under highly concurrent load. Several things are changing to make this happen.

      • First, the 2.2 release improves coordination of commits between Lucene, the graph, and the transaction log, resulting in a much more efficient write channel.
      • Next, the database kernel is enhanced to optimize the flushing of transactions to disk for high number of concurrent write threads. This allows throughput to improve significantly with more write threads since IO costs are spread across transactions. Applications with many small transactions being piped through large numbers (10-100+) of concurrent write threads will experience the greatest improvement.
      • Finally, we have improved and fully integrated the “Superfast Batch Loader”. Introduced in Neo4j 2.1, this utility now supports large scale non-transactional initial loads (of 10M to 10B+ elements) with sustained throughputs around 1M records (node or relationship or property) per second. This seriously fast utility is (unsurprisingly) called neo4j-import, and is accessible from the command line.

      3. Cypher Performance

      We’re very excited to be releasing the first version of a new Cost-Based Optimizer for Cypher, under development for nearly a year. While Cypher is hands-down the most convenient way to formulate queries, it hasn’t always been as fast as we’d like. Starting with Neo4j 2.2, Cypher will determine the optimal query plan by using statistics about your particular data set. Both the cost-based query planner, and the ability of the database to gather statistics, are new, and we’re very interested in your feedback. Sample queries & data sets are welcome!

The most recent milestone is here.

Now is the time to take Neo4j 2.2 for a spin!

January 20, 2015

Flask and Neo4j

Filed under: Blogs,Graphs,Neo4j,Python — Patrick Durusau @ 5:03 pm

Flask and Neo4j – An example blogging application powered by Flask and Neo4j. by Nicole White.

From the post:

I recommend that you read through Flask’s quickstart guide before reading this tutorial. The following is drawn from Flask’s tutorial on building a microblog application. This tutorial expands the microblog example to include social features, such as tagging posts and recommending similar users, by using Neo4j instead of SQLite as the backend database.
(14 parts follow here)

The fourteen parts take you all the way through deployment on Heroku.

I don’t think you will abandon your current blogging platform but you will gain insight into Neo4j and Flask. A non-trivial outcome.

Modelling Data in Neo4j: Labels vs. Indexed Properties

Filed under: Graphs,Neo4j — Patrick Durusau @ 2:15 pm

Modelling Data in Neo4j: Labels vs. Indexed Properties by Christophe Willemsen.

From the post:

A common question when planning and designing your Neo4j Graph Database is how to handle "flagged" entities. This could
include users that are active, blog posts that are published, news articles that have been read, etc.

Introduction

In the SQL world, you would typically create a a boolean|tinyint column; in Neo4j, the same can be achieved in the
following two ways:

  • A flagged indexed property
  • A dedicated label

Having faced this design dilemma a number of times, we would like to share our experience with the two
presented possibilities and some Cypher query optimizations that will help you take a full advantage of a the graph database.

Throughout the blog post, we'll use the following example scenario:

  • We have User nodes
  • User FOLLOWS other users
  • Each user writes multiple blog posts stored as BlogPost nodes
  • Some of the blog posts are drafted, others are published (active)

This post will help you make the best use of labels in Neo4j.

Labels are semantically opaque so if your Neo4j database has “German” to label books written in German, you are SOL if you need German for nationality.

That is a weakness semantically opaque tokens. Having type properties on labels would push the semantic opaqueness to the next level.

January 10, 2015

The Hobbit Graph, or To Nodes and Back Again

Filed under: Graphs,Neo4j — Patrick Durusau @ 4:43 pm

The Hobbit Graph, or To Nodes and Back Again by Kevin Van Gundy.

From the webpage:

With the final installment of Peter Jackson’s Hobbit Trilogy only a few months away, I decided it would be fun to graph out Tolkien’s novel in Neo4j and try a few different queries to show how a graph database can tell your data’s story.

This is quite clever and would sustain the interest of anyone old enough to appreciate the Hobbit.

Perhaps motivation to read a favorite novel slowly?

Enjoy!

I first saw this in a tweet by Nikolay Stoitsev.

Thoughts on Software Development Python NLTK/Neo4j:…

Filed under: Neo4j,NLTK,Text Analytics — Patrick Durusau @ 2:46 pm

Python NLTK/Neo4j: Analysing the transcripts of How I Met Your Mother by Mark Needham.

From the post:

After reading Emil’s blog post about dark data a few weeks ago I became intrigued about trying to find some structure in free text data and I thought How I met your mother’s transcripts would be a good place to start.

I found a website which has the transcripts for all the episodes and then having manually downloaded the two pages which listed all the episodes, wrote a script to grab each of the transcripts so I could use them on my machine.

Interesting intermarriage between NLTK and Neo4j. Perhaps even more so if NLTK were used to extract information from dialogue outside of fictional worlds and Neo4j was used to model dialogue roles, etc., as well as relationships and events outside of the dialogue.

Congressional hearings (in the U.S., same type of proceedings outside the U.S.) would make an interesting target for analysis using NLTK and Neo4j.

January 9, 2015

Using graph databases to perform pathing analysis… [In XML too?]

Filed under: Graphs,Neo4j,Path Enumeration — Patrick Durusau @ 5:51 pm

Using graph databases to perform pathing analysis – initial experiments with Neo4J by Nick Dingwall.

From the post:

In the first post in this series, we raised the possibility that graph databases might allow us to analyze event data in new ways, especially where we were interested in understanding the sequences that events occured in. In the second post, we walked through loading Snowplow page view event data into Neo4J in a graph designed to enable pathing analytics. In this post, we’re going to see whether the hypothesis we raised in the first post is right: can we perform the type of pathing analysis on Snowplow data that is so difficult and expensive when it’s in a SQL database, once it’s loaded in a graph?

In this blog post, we’re going to answer a set of questions related to the journeys that users have taken through our own (this) website. We’ll start by answering some some easy questions to get used to working with Cypher. Note that some of these simpler queries could be easily written in SQL; we’re just interested in checking out how Cypher works at this stage. Later on, we’ll move on to answering questions that are not feasible using SQL.

If you dream in markup, ;-), you are probably thinking what I’m thinking. Yes, what about modeling paths in markup documents? What is more, visualizing those paths. Would certainly beat the hell out of some of the examples you find in the XML specifications.

Not to mentioned that they would be paths in your own documents.

Question: I am assuming you would not collapse all the <p> nodes yes? That is for some purposes we display the tree as though every node is unique, identified by its location in the markup tree. For other purposes it might be useful to visualize some paths as collapsed node where size or color is an indicator of the number of nodes collapsed into that path.

That sounds like a Balisage presentation for 2015.

Natural Language Analytics made simple and visual with Neo4j

Filed under: Graphs,Natural Language Processing,Neo4j — Patrick Durusau @ 5:10 pm

Natural Language Analytics made simple and visual with Neo4j by Michael Hunger.

From the post:

I was really impressed by this blog post on Summarizing Opinions with a Graph from Max and always waited for Part 2 to show up 🙂

The blog post explains an really interesting approach by Kavita Ganesan which uses a graph representation of sentences of review content to extract the most significant statements about a product.

From later in the post:

The essence of creating the graph can be formulated as: “Each word of the sentence is represented by a shared node in the graph with order of words being reflected by relationships pointing to the next word”.

Michael goes on to create features with Cypher and admits near the end that “LOAD CSV” doesn’t really care if you have CSV files or not. You can split on a space and load text such as the “Lord of the Rings poem of the One Ring” into Neo4j.

Interesting work and a good way to play with text and Neo4j.

The single node per unique word presented here will be problematic if you need to capture the changing roles of words in a sentence.

January 8, 2015

Wikipedia in Python, Gephi, and Neo4j

Filed under: Gephi,Giraph,Neo4j,NetworkX,Python,Wikipedia — Patrick Durusau @ 3:22 pm

Wikipedia in Python, Gephi, and Neo4j: Vizualizing relationships in Wikipedia by Matt Krzus.

From the introduction:

g3

We have had a bit of a stretch here where we used Wikipedia for a good number of things. From Doc2Vec to experimenting with word2vec layers in deep RNNs, here are a few of those cool visualization tools we’ve used along the way.

Cool things you will find in this post:

  • Building relationship links between Categories and Subcategories
  • Visualization with Networkx (think Betweenness Centrality and PageRank)
  • Neo4j and Cypher (the author thinks avoiding the Giraph learning curve is a plus, I leave that for you to decide)
  • Visualization with Gephi

Enjoy!

December 6, 2014

Better, Faster, and More Scalable Neo4j than ever before

Filed under: Graphs,Neo4j — Patrick Durusau @ 11:18 am

Better, Faster, and More Scalable Neo4j than ever before by Philip Rathle.

From the post:

Neo4j 2.2 aims to be our fastest and most scalable release ever. With Neo4j 2.2 our engineering team introduces massive enhancements to the internal architecture resulting in higher performance and scalability.

This first milestone (or beta release) pulls all of these new elements together, so that you can “dial it up to 11″ with your applications. You can download it here for your testing.

Philip highlights:

  1. Highly Concurrent Performance
  2. Transactional & Batch Write Performance
  3. Cypher Performance (includes a Cost-Based Optimizer)

BTW, there is news of a new and improved batch loader: neo4j-import.

I included the direct link because the search interface for the milestone release acts oddly.

If you enter (with quotes) “neo4j-import” (not an unreasonable query), results are returned for: import, neo4j. I haven’t tried other queries that include a hyphen. You?

November 27, 2014

A Docker Image for Graph Analytics on Neo4j with Apache Spark GraphX

Filed under: Graphs,GraphX,Neo4j,Spark — Patrick Durusau @ 8:20 pm

A Docker Image for Graph Analytics on Neo4j with Apache Spark GraphX by Kenny Bastani.

From the post:

I’ve just released a useful new Docker image for graph analytics on a Neo4j graph database with Apache Spark GraphX. This image deploys a container with Apache Spark and uses GraphX to perform ETL graph analysis on subgraphs exported from Neo4j. This docker image is a great addition to Neo4j if you’re looking to do easy PageRank or community detection on your graph data. Additionally, the results of the graph analysis are applied back to Neo4j.

This gives you the ability to optimize your recommendation-based Cypher queries by filtering and sorting on the results of the analysis.

This rocks!

If you were looking for an excuse to investigate Docker or Spark or GraphX or Neo4j, it has arrived!

Enjoy!

November 26, 2014

Neo4j 2.1.6 (release)

Filed under: Graphs,Neo4j — Patrick Durusau @ 8:15 pm

Neo4j 2.1.6 (release)

From the post:

Neo4j 2.1.6 is a maintenance release, with critical improvements.

Notably, this release:

  • Resolves a critical shutdown issue, whereby IO errors were not always handled correctly and could result in inconsistencies in the database due to failure to flush outstanding changes.
  • Significantly reduce the file handle requirements for the lucene based indexes.
  • Resolves an issue in consistency checking, which could falsely report store inconsistencies.
  • Extends the Java API to allow the degree of a node to be easily obtained (the count of relationships, by type and direction).
  • Resolves a significant performance degradation that affected the loading of relationships for a node during traversals.
  • Resolves a backup issue, which could result in a backup store that would not load correctly into a clustered environment (Neo4j Enterprise).
  • Corrects a clustering issue that could result in the master failing to resume its role after an outage of a majority of slaves (Neo4j Enterprise).

All Neo4j 2.x users are recommended to upgrade to this release. Upgrading to Neo4j 2.1, from Neo4j 1.9.x or Neo4j 2.0.x, requires a migration to the on-disk store and can not be reversed. Please ensure you have a valid backup before proceeding, then use on a test or staging server to understand any changed behaviors before going into production.

Neo4j 1.9 users may upgrade directly to this release, and are recommended to do so carefully. We strongly encourage verifying the syntax and validating all responses from your Cypher scripts, REST calls, and Java code before upgrading any production system. For information about upgrading from Neo4j 1.9, please see our Upgrading to Neo4j 2 FAQ.

For a full summary of changes in this release, please review the CHANGES.TXT file contained within the distribution.

Downloads

As with all software upgrades, do not delay until the day before you are leaving on holiday!

November 22, 2014

Using Load CSV in the Real World

Filed under: CSV,Graphs,Neo4j — Patrick Durusau @ 11:23 am

Using Load CSV in the Real World by Nicole White.

From the description:

In this live-coding session, Nicole will demonstrate the process of downloading a raw .csv file from the Internet and importing it into Neo4j. This will include cleaning the .csv file, visualizing a data model, and writing the Cypher query that will import the data. This presentation is meant to make Neo4j users aware of common obstacles when dealing with real-world data in .csv format, along with best practices when using LOAD CSV.

A webinar with substantive content and not marketing pitches! Unusual but it does happen.

A very good walk through importing a CSV file into Neo4j, with some modeling comments along the way and hints of best practices.

The “next” thing for users after a brief introduction to graphs and Neo4j.

The experience will build their confidence and they will learn from experience what works best for modeling their data sets.

November 15, 2014

Py2neo 2.0

Filed under: Graphs,Neo4j,py2neo,Python — Patrick Durusau @ 7:30 pm

Py2neo 2.0 by Nigel Small.

From the webpage:

Py2neo is a client library and comprehensive toolkit for working with Neo4j from within Python applications and from the command line. The core library has no external dependencies and has been carefully designed to be easy and intuitive to use.

If you are using Neo4j or Python or both, you need to be aware of Py2Neo 2.0.

Impressive documentation!

I haven’t gone through all of it but contributed examples would be helpful.

For example:

API: Cypher

exception py2neo.cypher.ClientError(message, **kwargs)

The Client sent a bad request – changing the request might yield a successful outcome.

exception py2neo.cypher.error.request.Invalid(message, **kwargs)[source]

The client provided an invalid request.

Without an example the difference between a “bad” versus an “invalid” request isn’t clear.

Writing examples would not be a bad way to work through the Py2neo 2.0 documentation.

Enjoy!

I first saw this in a tweet by Nigel Small.

November 8, 2014

Mazerunner – Update – Neo4J – GraphX

Filed under: Graphs,GraphX,Neo4j — Patrick Durusau @ 7:36 pm

Three new algorithms have been added to Mazerunner:

  • Triangle Count
  • Connected Components
  • Strongly Connected Components

From: Using Apache Spark and Neo4j for Big Data Graph Analytics

Mazerunner uses a message broker to distribute graph processing jobs to Apache Spark’s GraphX module. When an agent job is dispatched, a subgraph is exported from Neo4j and written to Apache Hadoop HDFS.

That’s good news!

I first saw this in a tweet by Kenny Bastani

November 3, 2014

neo4apis

Filed under: Graphs,Neo4j,Tweets — Patrick Durusau @ 9:09 pm

neo4apis by Brian Underwood.

From the post:

I’ve been reading a few interesting analyses of Twitter data recently such as this #gamergate analysis by Andy Baio. I thought it would be nice to have a mechanism for people to quickly and easily import data from Twitter to Neo4j for research purposes. Like a good programmer I had to go up at least one level of abstraction. Thus was born the ruby gems neo4apis and neo4apis-twitter (and, incidentally, neo4apis-github just to prove it was repeatable).

Using the neo4apis-twitter gem is easy and can be used either in your ruby code or from the command line. neo4apis takes care of loading your data efficiently as well as creating database indexes so that you can query it effectively.

In case you haven’t heard, the number of active Twitter users is estimated at 228 million. That is a lot of users but as I write this post, the world’s population passed 7,271,955,000.

Just doing rough numbers, 7,271,955,000 / 228,000,000 = 31.

So if you captured a tweet from every active twitter user, that would be 1/31 of the world’s population.

Not saying you shouldn’t capture tweets or analyze them in Neo4j. I am saying that you should be mindful of the lack of representation in such tweets.

November 1, 2014

Querying Graphs with Neo4j [cheatsheet]

Filed under: Cypher,Neo4j — Patrick Durusau @ 6:13 pm

Querying Graphs with Neo4j by Michael Hunger.

Download the refcard by usual process, login into Dzone, etc.

When you open the PDF file in a viewer, do be careful. (Page references are to the DZone cheatsheet.)

Cover The entire cover is a download link. Touch it at all and you will be taken to a download link for Neo4j.

Page 1 covers “What is a Graph Database?” and “What is Neo4j?,” just in case you have been forced by home invaders to download a refcard for a technology you know nothing about.

Page 2 pitches the Neo4j server and then Getting Started with Neo4j, perhaps to annoy the NSA with repetitive content.

The DZone cheatsheet replicates the cheatsheet at: http://neo4j.com/docs/2.0/cypher-refcard/, with the following changes:

Page 3

WITH

Re-written. Old version:

MATCH (user)-[:FRIEND]-(friend) WHERE user.name = {name} WITH user, count(friend) AS friends WHERE friends > 10 RETURN user

The WITH syntax is similar to RETURN. It separates query parts explicitly, allowing you to declare which identifiers to carry over to the next part.

MATCH (user)-[:FRIEND]-(friend) WITH user, count(friend) AS friends ORDER BY friends DESC SKIP 1 LIMIT 3 RETURN user

You can also use ORDER BY, SKIP, LIMIT with WITH.

New version:

MATCH (user)-[:KNOWS]-(friend) WHERE user.name = {name} WITH user, count(*) AS friends WHERE friends > 10 RETURN user

WITH chains query parts. It allows you to specify which projection of your data is available after WITH.

ou can also use ORDER BY, SKIP, LIMIT and aggregation with WITH. You might have to alias expressions to give them a name.

I leave it to your judgement which version was the clearer.

Page 4

MERGEinserts: typo “{name: {value3}} )” on last line of final example under MERGE.

SETinserts: “SET n += {map} Add and update properties, while keeping existing ones.”

INDEXinserts: “MATCH (n:Person) WHERE n.name IN {values} An index can be automatically used for the IN collection checks.”

Page 5

PATTERNS

changes: “(n)-[*1..5]->(m) Variable length paths.” to “(n)-[*1..5]->(m) Variable length paths can span 1 to 5 hops.”

changes: “(n)-[*]->(m) Any depth. See the performance tips.” to “(n)-[*]->(m) Variable length path of any depth. See performance tips.”

changes: “shortestPath((n1:Person)-[*..6]-(n2:Person)) Find a single shortest path.” to “shortestPath((n1)-[*..6]-(n2))”

COLLECTIONS

changes: “range({first_num},{last_num},{step}) AS coll Range creates a collection of numbers (step is optional), other functions returning collections are: labels, nodes, relationships, rels, filter, extract.” to “range({from},{to},{step}) AS coll Range creates a collection of numbers (step is optional).” [Loss of information from the earlier version.]

inserts: “UNWIND {names} AS name MATCH (n:Person {name:name}) RETURN avg(n.age) With UNWIND, you can transform any collection back into individual rows. The example matches all names from a list of names.”

MAPS

inserts: “range({start},{end},{step}) AS coll Range creates a collection of numbers (step is optional).”

Page 6

PREDICATES

changes: “NOT (n)-[:KNOWS]->(m) Exclude matches to (n)-[:KNOWS]->(m) from the result.” to “NOT (n)-[:KNOWS]->(m) Make sure the pattern has at least one match.” [Older version more precise?]

replaces: mixed case, true/TRUE with TRUE

FUNCTIONS

inserts: “toInt({expr}) Converts the given input in an integer if possible; otherwise it returns NULL.”

inserts: “toFloat({expr}) Converts the given input in a floating point number if possible; otherwise it returns NULL.”

PATH FUNCTIONS

changes: “MATCH path = (begin) -[*]-> (end) FOREACH (n IN rels(path) | SET n.marked = TRUE) Execute a mutating operation for each relationship of a path.” to “MATCH path = (begin) -[*]-> (end) FOREACH (n IN rels(path) | SET n.marked = TRUE) Execute an update operation for each relationship of a path.”

COLLECTION FUNCTIONS

changes: “FOREACH (value IN coll | CREATE (:Person {name:value})) Execute a mutating operation for each element in a collection.” to “FOREACH (value IN coll | CREATE (:Person {name:value})) Execute an update operation for each element in a collection.”

MATHEMATICAL FUNCTIONS

changes: degrees({expr}), radians({expr}), pi() Converts radians into degrees, use radians for the reverse. pi for π.” to “degrees({expr}), radians({expr}), pi() to Converts radians into degrees, use radians for the reverse.” Loses “pi for π.”

changes: “log10({expr}), log({expr}), exp({expr}), e() Logarithm base 10, natural logarithm, e to the power of the parameter. Value of e.” to “log10({expr}), log({expr}), exp({expr}), e() Logarithm base 10, natural logarithm, e to the power of the parameter.” Loses “Value of e.”

Page 7

STRING FUNCTIONS

inserts: “split({string}, {delim}) Split a string into a collection of strings.”

AGGREGATION changes: collect(n.property) Collection from the values, ignores NULL. to “collect(n.property) Value collection, ignores NULL.”

START

remove: “START n=node(*) Start from all nodes.”

remove: “START n=node({ids}) Start from one or more nodes specified by id.”

remove: “START n=node({id1}), m=node({id2}) Multiple starting points.”

remove: “START n=node:nodeIndexName(key={value}) Query the index with an exact query. Use node_auto_index for the automatic index.”

inserts: “START n = node:indexName(key={value}) n=node:nodeIndexName(key={value}) n=node:nodeIndexName(key={value}) Query the index with an exact query. Use node_auto_index for the old automatic index.”

inserts: ‘START n = node:indexName({query}) Query the index by passing the query string directly, can be used with lucene or spatial syntax. E.g.: “name:Jo*” or “withinDistance:[60,15,100]”‘


I may have missed some changes because as you know, the “cheatsheets” for Cypher have no particular order for the entries. Alphabetical order suggests itself for future editions, sans the marketing materials.

Changes to a query language should appear where a user would expect to find the command in question. For example, the “CREATE a={property:’value’} has been removed” should appear where expected on the cheatsheet, noting the change. Users should not have to hunt high and low for “CREATE a={property:’value’}” on a cheatsheet.

I have passed over incorrect use of the definite article and other problems without comment.

Despite the shortcomings of the DZone refcard, I suggest that you upgrade to it.

October 22, 2014

Loading CSV files into Neo4j

Filed under: CSV,Neo4j — Patrick Durusau @ 8:22 pm

Loading CSV files into Neo4j is so easy that it has taken only three (3) posts, so far, to explain the process. This post is a collection of loading CSV into Neo4j references. If you have others, feel free to contribute them and I will add them to this post.

LOAD CSV into Neo4j quickly and successfully by Michael Hunger on Jun 25, 2014.

Note: You can also read an interactive and live version of this blog post as a Neo4j GraphGist.

Since version 2.1 Neo4j provides out-of-the box support for CSV ingestion. The LOAD CSV command that was added to the Cypher Query language is a versatile and powerful ETL tool.

It allows you to ingest CSV data from any URL into a friendly parameter stream for your simple or complex graph update operation, that … conversion.

The June 25, 2014 post has content that is not repeated in the Oct. 18, 2014 post on loading CSV so you will need both posts, or a very fine memory.

Flexible Neo4j Batch Import with Groovy by Michael Hunger on Oct 9, 2014.

You might have data as CSV files to create nodes and relationships from in your Neo4j Graph Database.

It might be a lot of data, like many tens of million lines.

Too much for LOAD CSV to handle transactionally.

Usually you can just fire up my batch-importer and prepare node and relationship files that adhere to its input format requirements.

What follows is advice on when you may want to deviate from the batch-importer defaults and how to do so.

LOAD CVS with SUCCESS by Michael Hunger on Oct 18, 2014.

I have to admit that using our LOAD CSV facility is trickier than you and I would expect.

Several people ran into issues that they could not solve on their own.

My first blog post on LOAD CSV is still valid in it own right, and contains important aspects that I won’t repeat here.

Incomplete so reference LOAD CSV into Neo4j quickly and successfully while reading this post.

Others?

October 14, 2014

RNeo4j: Neo4j graph database combined with R statistical programming language

Filed under: Graphs,Neo4j,R — Patrick Durusau @ 2:46 pm

From the description:

RNeo4j combines the power of a Neo4j graph database with the R statistical programming language to easily build predictive models based on connected data. From calculating the probability of friends of friends connections to plotting an adjacency heat map based on graph analytics, the RNeo4j package allows for easy interaction with a Neo4j graph database.

Nicole is the author of the RNeo4j R package. Don’t be dismayed by the “What is a Graph” and “What is R” in the presentation outline. Mercifully only three minutes followed by a rocking live coding demonstration of the package!

Beyond Neo4j and R, use this webinar as a standard for the useful content that should appear in a webinar!

RNeo4j at Github.

October 10, 2014

Flexible Neo4j Batch Import with Groovy

Filed under: Graphs,Neo4j — Patrick Durusau @ 4:03 pm

Flexible Neo4j Batch Import with Groovy by Michael Hunger.

From the post:

You might have data as CSV files to create nodes and relationships from in your Neo4j Graph Database.
It might be a lot of data, like many tens of million lines.
Too much for LOAD CSV to handle transactionally.

Usually you can just fire up my batch-importer and prepare node and relationship files that adhere to its input format requirements.

Your Requirements

There are some things you probably want to do differently than the batch-importer does by default:

  • not create legacy indexes
  • not index properties at all that you just need for connecting data
  • create schema indexes
  • skip certain columns
  • rename properties from the column names
  • create your own labels based on the data in the row
  • convert column values into Neo4j types (e.g. split strings or parse JSON)

Michael helps you avoid the defaults of batch importing into Neo4j.

October 8, 2014

A look at Cayley

Filed under: Cayley,Graphs,Neo4j — Patrick Durusau @ 4:15 pm

A look at Cayley by Tony.

From the post:

Recently I took the time to check out Cayley, a graph database written in Go that’s been getting some good attention.

cayley

https://github.com/google/cayley

A great introduction to Cayley. Tony has some comparisons to Neo4j, but for beginners with graph databases, those comparisons may not be real useful. Come back for those comparisons once you have moved beyond example graphs.

September 30, 2014

Neo4j: Generic/Vague relationship names

Filed under: Graphs,Neo4j — Patrick Durusau @ 7:21 pm

Neo4j: Generic/Vague relationship names by Mark Needham.

From the post:

An approach to modelling that I often see while working with Neo4j users is creating very generic relationships (e.g. HAS, CONTAINS, IS) and filtering on a relationship property or on a property/label at the end node.

Intuitively this doesn’t seem to make best use of the graph model as it means that you have to evaluate many relationships and nodes that you’re not interested in whereas if you use a more specific relationship type that isn’t the case.

However, I’ve never actually tested the performance differences between the approaches so I thought I’d try it out.

I created 4 different databases which had one node with 60,000 outgoing relationships – 10,000 which we wanted to retrieve and 50,000 that were irrelevant.

I modelled the ‘relationship’ in 4 different ways…

  • Filter by relationship type
    (node)-[:HAS_ADDRESS]->(address)
  • Filter by end node label
    (node)-[:HAS]->(address:Address)
  • Filter by relationship property
    (node)-[:HAS {type: “address”}]->(address)
  • Filter by end node
    (node)-[:HAS]->(address {type: “address”})

…and then measured how long it took to retrieve the ‘has address’ relationships.

See Mark’s post for the test results but the punch line is the less filtering required, the faster the result.

Designing data structures for eventual queries seems sub-optimal to me.

You?

Neo4j 2.1.5

Filed under: Graphs,Neo4j — Patrick Durusau @ 4:11 pm

Neo4j 2.1.5

From the post:

Neo4j 2.1.5 is a maintenance release, with critical improvements.

Notably, this release addresses the following:

  • Corrects a Cypher compiler error introduced only in Neo4j 2.1.4, which caused Cypher queries containing nested maps to fail type checking.
  • Resolves a critical error, where discrete remove+add operations on properties could result in a new property being added, without the old property being correctly removed.
  • Corrects an issue causing significantly degraded write performance in larger transactions.
  • Improves memory use in Cypher queries containing OPTIONAL MATCH.
  • Resolves an issue causing failed index lookups for some newly created integer properties.
  • Fixes an issue which could cause excessive store growth in some clustered environments (Neo4j Enterprise).
  • Adds additional metadata (label and ID) to node and relationship representations in JSON responses from the REST API.
  • Resolves an issue with extraneous remove commands being added to the legacy auto-index transaction log.
  • Resolves an issue preventing the lowest ID cluster member from successfully leaving and rejoining the cluster, in cases where it was not the master (Neo4j Enterprise).

All Neo4j 2.x users are recommended to upgrade to this release. Upgrading to Neo4j 2.1 requires a migration to the on-disk store and can not be reversed. Please ensure you have a valid backup before proceeding, then use on a test or staging server to understand any changed behaviors before going into production.

Neo4j 1.9 users may upgrade directly to this release, and are recommended to do so carefully. We strongly encourage verifying the syntax and validating all responses from your Cypher scripts, REST calls, and Java code before upgrading any production system. For information about upgrading from Neo4j 1.9, please see our Upgrading to Neo4j 2 FAQ.

Do you remember which software company had the “We are holding the gun but you decide whether to pull the trigger” type upgrade warning? There are so many legendary upgrade stories that it is hard to remember them all. Is there a collection of upgrade warnings and/or stories on the Net? Thanks!

BTW, if you are running Neo4j 2.x upgrade. No comment on Neo4j 1.9.

August 28, 2014

…Deep Learning Text Classification

Filed under: Deep Learning,Graphs,Neo4j — Patrick Durusau @ 4:20 pm

Using a Graph Database for Deep Learning Text Classification by Kenny Bastani.

From the post:

Graphify is a Neo4j unmanaged extension that provides plug and play natural language text classification.

Graphify gives you a mechanism to train natural language parsing models that extract features of a text using deep learning. When training a model to recognize the meaning of a text, you can send an article of text with a provided set of labels that describe the nature of the text. Over time the natural language parsing model in Neo4j will grow to identify those features that optimally disambiguate a text to a set of classes.

Similarity and graphs. What’s there to not like?

July 31, 2014

How To Create Semantic Confusion

Filed under: Cypher,Neo4j,Semantics — Patrick Durusau @ 10:38 am

Merge: to cause (two or more things, such as two companies) to come together and become one thing : to join or unite (one thing) with another (http://www.merriam-webster.com/dictionary/merge

Do you see anything common between that definition of merge and:

  • It ensures that a pattern exists in the graph by creating it if it does not exist already
  • It will not use partially existing (unbound) patterns- it will attempt to match the entire pattern and create the entire pattern if missing
  • When unique constraints are defined, MERGE expects to find at most one node that matches the pattern
  • It also allows you to define what should happen based on whether data was created or matched

The quote is from Cypher MERGE Explained by Luanne Misquitta. Great post if you want to understand the operation of Cypher “merge,” which has nothing in common with the term “merge” in English.

Want to create semantic confusion?

Choose a well-known term and define new and unrelated semantics for it. Creates a demand for training, tutorials as well as confused users.

I first saw this in a tweet by GraphAware.

July 28, 2014

A Survey of Graph Theory and Applications in Neo4J

Filed under: Graphs,Neo4j — Patrick Durusau @ 7:35 pm

A Survey of Graph Theory and Applications in Neo4J by Geoff Moes.

A great summary of resources on graph theory along with a two part presentation on the same.

Geoff mentions: Graph Theory, 1736-1936 by Norman L. Biggs, E. Keith Lloyd, and Robin J. Wilson, putting to rest any notion that graphs are a recent invention.

Enjoy!

July 25, 2014

Neo4j Index Confusion

Filed under: Graphs,Indexing,Neo4j — Patrick Durusau @ 1:34 pm

Neo4j Index Confusion by Nigel Small.

From the post:

Since the release of Neo4j 2.0 and the introduction of schema indexes, I have had to answer an increasing number of questions arising from confusion between the two types of index now available: schema indexes and legacy indexes. For clarification, these are two completely different concepts and are not interchangable or compatible in any way. It is important, therefore, to make sure you know which you are using.
….

Nigel forgets to mention that legacy indexes were based on Lucene, schema indexes, not.

If you are interested in the technical details of the schema indexes, start with On Creating a MapDB Schema Index Provider for Neo4j 2.0 by Michael Hunger.

Michael says in his tests that the new indexing solution is faster than Lucene. Or more accurately, faster than Lucene as used in prior Neo4j versions.

How simple are your indexing needs?

July 21, 2014

Graffeine

Filed under: D3,Graphs,Neo4j,Visualization — Patrick Durusau @ 4:37 pm

Graffeine by Julian Browne

From the webpage:

Caffeinated Graph Exploration for Neo4J

Graffeine is both a useful interactive demonstrator of graph capability and a simple visual administration interface for small graph databases.

Here it is with the, now canonical, Dr Who graph loaded up:

Dr. Who graph

From the description:

Graffeine plugs into Neo4J and renders nodes and relationships as an interactive D3 SVG graph so you can add, edit, delete and connect nodes. It’s not quite as easy as a whiteboard and a pen, but it’s close, and all interactions are persisted in Neo4J.

You can either make a graph from scratch or browse an existing one using search and paging. You can even “fold” your graph to bring different aspects of it together on the same screen.

Nodes can be added, updated, and removed. New relationships can be made using drag and drop and existing relationships broken.

It’s by no means phpmyadmin for Neo4J, but one day it could be (maybe).

A great example of D3 making visual editing possible.

July 11, 2014

Neo4j’s Cypher vs Clojure – Group by and Sorting

Filed under: Clojure,Cypher,Neo4j — Patrick Durusau @ 6:46 pm

Neo4j’s Cypher vs Clojure – Group by and Sorting by Mark Needham.

From the post:

One of the points that I emphasised during my talk on building Neo4j backed applications using Clojure last week is understanding when to use Cypher to solve a problem and when to use the programming language.

A good example of this is in the meetup application I’ve been working on. I have a collection of events and want to display past events in descending order and future events in ascending order.

Mark falls back on Clojure to cure the lack of sorting within a collection in Cypher.

« Newer PostsOlder Posts »

Powered by WordPress