Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 6, 2014

The Zen of Cypher

Filed under: Cypher,Graphs,Neo4j — Patrick Durusau @ 2:53 pm

The Zen of Cypher by Nigel Small.

The original “Zen” book, Zen and the art of motorcycle maintenance: an inquiry into values runs four hundred and eighteen pages.

Nigel has a useful summary for Cypher but I would estimate it runs about a page.

Not really the in depth sort of treatment that qualifies for a “Zen” title.

Yes?

June 24, 2014

UbiGraph WARNING: Out Dated Software

Filed under: Graphs,Neo4j — Patrick Durusau @ 3:45 pm

Rendering a Neo4j Database in UbiGraph by Michael Hunger.

Michael covers loading and visualizing data with UbiGraph.

The UbiGraph pages document UbiGraph alpha-0.2.4 and it dates from June 2008. That build is targeted at Ubuntu 8.04 x86_64.

Without the source code, I’m not sure you need to spend a lot of effort on UbiGraph.

The year 2008 was what, twenty web-years ago?

June 21, 2014

Storing and visualizing LinkedIn…

Filed under: Intelligence,Neo4j,Social Networks,Visualization — Patrick Durusau @ 4:42 pm

Storing and visualizing LinkedIn with Neo4j and sigma.js by Bob Briody.

From the post:

In this post I am going to present a way to:

  • load a linkedin networkvia the linkedIn developer API into neo4j using python
  • serve the network from neo4j using node.js, express.js, and cypher
  • display the network in the browser using sigma.js

Great post but it means one (1) down and two hundred and five (205) more to go, if you are a member of the social networks listed on List of social networking websites at Wikipedia, and that excludes dating sites and includes only “notable, well-known sites.”

I would be willing to bet that your social network of friends, members of your religious organization, people where you work, etc. would start to swell the number of other social networks that number you as a member.

Hmmm, so one off social network visualizations are just that, one off social network visualizations. You can been seen as part of one group and not say two or three intersecting groups.

Moreover, an update to one visualized network isn’t going to percolate into another visualized network.

There is the “normalize your graph” solution to integrate such resources but what if you aren’t the one to realize the need for “normalization?”

You have two separate actors in your graph visualization after doing the best you can. Another person encounters information indicating these “two” people are in fact one person. They update their data. But that updated knowledge has no impact on your visualization, unless you simply happen across it.

Seems like a poor way to run intelligence gathering doesn’t it?

June 18, 2014

Time-Based Versioned Graphs

Filed under: Graphs,Neo4j,Versioning — Patrick Durusau @ 5:02 pm

Time-Based Versioned Graphs

From the post:

Many graph database applications need to version a graph so as to see what it looked like at a particular point in time. Neo4j doesn’t provide intrinsic support either at the level of its labelled property graph or in the Cypher query language for versioning. Therefore, to version a graph we need to make our application graph data model and queries version aware.

Separate Structure From State

The key to versioning a graph is separating structure from state. This allows us to version the graph’s structure independently of its state.

To help describe how to design a version-aware graph model, I’m going to introduce some new terminology: identity nodes, state nodes, structural relationships and state relationships.

Identity Nodes

Identity nodes are used to represent the positions of entities in a domain-meaningful graph structure. Each identity node contains one or more immutable properties, which together constitute an entity’s identity. In a version-free graph (the kind of graph we build normally) nodes tend to represent both an entity’s position and its state. Identity nodes in a version-aware graph, in contrast, serve only to identify and locate an entity in a network structure.

Structural Relationships

Identity nodes are connected to one another using timestamped structural relationships. These structural relationships are similar to the domain relationships we’d include in a version-free graph, except they have two additional properties, from and to, both of which are timestamps.

State Nodes and Relationships

Connected to each identity node are one or more state nodes. Each state node represents a snapshot of an entity’s state. State nodes are connected to identity nodes using timestamped state relationships.

Great modeling example but you have to wonder about a graph implementation that doesn’t support versioning out of the box.

It can be convenient to treat data as though it were stable, but we all know that isn’t true.

Don’t we?

June 12, 2014

Importing CSV data into Neo4j…

Filed under: CSV,Graphs,Neo4j — Patrick Durusau @ 7:44 pm

Importing CSV data into Neo4j to make a graph by Samantha Zeitlin.

From the post:

Thanks to a friend who wants to help more women get into tech careers, last year I attended Developer Week, where I was impressed by a talk about Neo4j.

Graph databases excited me right away, since this is a concept I’ve used for brainstorming since 3rd grade, when my teachers Mrs. Nysmith and Weaver taught us to draw webbings as a way to take notes and work through logic puzzles.

Samantha is successful at importing CSV data into Neo4j but only after encountering an out-dated blog post, a stack overflow example and then learning there is a new version of the importer available.

True, many of us learned *nix from the man pages but while effective, I can’t really say it was an efficient way to learn *nix.

Most people have a task for your software. They are not seeking to mind meld with it or to take it up as a new religion.

Emphasize the ease of practical use of your software and you will gain devotees despite it being easy to use.

June 6, 2014

Offshore Leaks:…Azerbaijan

Filed under: Graphs,Neo4j — Patrick Durusau @ 2:26 pm

How to use Neo4j to analyse the Offshore Leaks : the case of Azerbaijan by Jean Villedieu.

From the post:

Introduction to Problem

The Offshore Leaks released in 2013 by the ICIJ is a rarity. It is a big dataset of real information about some of the most secret places on earth : the offshore financial centers. The investigation of the ICIJ brought to the surface many interesting stories including the suspicious activities of the President of Azerbaijan. We are going to see how graph technologies can help us make sense of the complex data in the Offshore Leaks.

Our data model for the Offshore Leaks

We want to know how the President of Azerbaijan is connected to offshore accounts. This means that we will need to focus on the network he uses to control his assets stored in offshore entities. These networks includes family members and a complex set of intermediaries or partners. We want to see how things are connected so we are going to have to represent each of these entities as distinct nodes in a graph.

A good tutorial on Neo4j, Cypher (query language) and modeling data.

Notice I didn’t say “modeling data with graphs.” That is the result in this case but modeling data should inform your choice of storage or analytical solutions. Saying that graphs can model any data is a truism that doesn’t lead to informed IT choices.

In this particular case I would suggest using graphs, in part because the relationships between actors and their types aren’t known in advance. Some aspects of stock trading systems would not present the same issues.

Graphs don’t have this as an inherent limitation but if several groups were gathering information about President Ilham Aliyev and quite easily using different names/identifiers, how would you merge those graphs together? Would you have to re-create the relationships between actors if new nodes had to replace old ones?

Graphs are very good for some data. Distributed and collaborative graphs are even better.

Further information on Offshore Leaks.

I first saw this in a tweet by GraphemeDB.

May 31, 2014

Rneo4j

Filed under: Graphs,Neo4j,R — Patrick Durusau @ 9:37 am

Nicole White has authored an R driver for Neo4j known as Rneo4j.

To tempt one or more people into trying Rneo4j, two posts have appeared:

Demo of Rneo4j Part 1: Building a Database

Covers installation of the necessary R packages and the creation of a Twitter database for tweets containing “neo4j.”

Demo of Rneo4j Part 2: Plotting and Analysis

Uses Cypher results as an R data frame, which opens the data up to the full range of R analysis and display capabilities.

R users will find this a useful introduction to Neo4j and Neo4j users will be introduced to a new level of post-graph creation possibilities.

May 29, 2014

Neo4j 2.1 – Graph ETL for Everyone

Filed under: Graphs,Neo4j — Patrick Durusau @ 6:42 pm

Neo4j 2.1 – Graph ETL for Everyone

From the post:

It’s an exciting time for Neo4j users and, of course, the Neo4j team as we’re releasing the 2.1 version of Neo4j! You’ve probably already seen the amazing strides we’ve taken when releasing our 2.0 version at the start of the year, and Neo4j 2.1 continues to improve the user experience while delivering some impressive under-the-hood improvements, and some interesting work on boosting Cypher too.

Easy import with ETL features directly in Cypher

Graphs are everywhere, but sometimes they’re buried in other systems and legacy databases. You need to extract the data then bring it into Neo4j to experience its true graph form. To help you do this, we’ve brought bulk load functionality directly into Cypher. The new LOAD CSV clause makes that a pleasant and simple task, optimized for graphs around millions scale – the kind of size that folks typically encounter when getting started with Neo4j.

Err, but the line:

You need to extract the data then bring it into Neo4j to experience its true graph form.

isn’t really true is it?

In other words, to process a graph with Neo4j, you have to extract, transform and load the date into Neo4j. Yes?

That is if I could address the data in situ (in its original place) and add the properties I need to process it as a graph, no extraction, transformation and loading are necessary.

Yes?

Not to downplay the usefulness of better importing, if your software requires it, but we do need to be precise about what is being described.

There are other new features and improvements so download a copy of Neo4j 2.1 today!

May 23, 2014

Neo4j 2.0: Creating adjacency matrices

Filed under: Cypher,Graphs,Neo4j — Patrick Durusau @ 7:11 pm

Neo4j 2.0: Creating adjacency matrices by Mark Needham.

From the post:

About 9 months ago I wrote a blog post showing how to export an adjacency matrix from a Neo4j 1.9 database using the cypher query language and I thought it deserves an update to use 2.0 syntax.

I’ve been spending some of my free time working on an application that runs on top of meetup.com’s API and one of the queries I wanted to write was to find the common members between 2 meetup groups.

The first part of this query is a cartesian product of the groups we want to consider which will give us the combinations of pairs of groups:

I can imagine several interesting uses for the adjacency matrices that Mark describes.

One of which is common membership in groups as the post outlines.

Another would be a common property or sharing a value within a range.

Yes?

March 27, 2014

Modeling and Discovering Vulnerabilities…

Filed under: Cybersecurity,Graphs,Neo4j,Security — Patrick Durusau @ 9:45 am

Modeling and Discovering Vulnerabilities with Code Property Graphs by Fabian Yamaguchi, Nico Golde, Daniel Arp, and Konrad Rieck.

Abstract:

The vast majority of security breaches encountered today are a direct result of insecure code. Consequently, the protection of computer systems critically depends on the rigorous identification of vulnerabilities in software, a tedious and error-prone process requiring significant expertise. Unfortunately, a single flaw suffices to undermine the security of a system and thus the sheer amount of code to audit plays into the attacker’s cards. In this paper, we present a method to effectively mine large amounts of source code for vulnerabilities. To this end, we introduce a novel representation of source code called a code property graph that merges concepts of classic program analysis, namely abstract syntax trees, control flow graphs and program dependence graphs, into a joint data structure. This comprehensive representation enables us to elegantly model templates for common vulnerabilities with graph traversals that, for instance, can identify buffer overflows, integer overflows, format string vulnerabilities, or memory disclosures. We implement our approach using a popular graph database and demonstrate its efficacy by identifying 18 previously unknown vulnerabilities in the source code of the Linux kernel.

I was running down references in the documentation for joern when I discovered this paper.

The recent SSH bug in the Apple iOS is used to demonstrate a code property graph that combines the perspectives of Abstract Syntax Trees, Control Flow Graphs, and Program Dependence Graphs.

In topic map lingo we would call those “universes of discourse,” but the essential fact to remember is that combining different perspectives (are you listening NSA?) is where a code property graph derives its power.

Note that I said “combining” (different perspectives are preserved) not “sanitizing” (different perspectives are lost).

Using Neo4j, the authors created a code property graph of the Linux kernel, 52 million nodes and 87 million edges. As a result of their analysis, they discovered 18 previously undiscovered bugs.

Important: Patterns discovered in a code property graph can be used to identify vulnerabilities in other source code. Searching for bugs in source code can become cumulative and and less episodic.

Comparison of source and bug histories of the Linux kernel, Apache http server, Sendmail, etc. will provide some of the common graph patterns signaling vulnerabilities in source code.

Will the white or black hat community will be the first to build a public repository for graph patterns showing source code vulnerabilities?

Hiding security information hasn’t worked so far but I think you know the most likely result.

March 26, 2014

joern

Filed under: Cybersecurity,Graphs,Neo4j,Programming,Security — Patrick Durusau @ 7:25 pm

joern

From the webpage:

Source code analysis is full of graphs: abstract syntax trees, control flow graphs, call graphs, program dependency graphs and directory structures, to name a few. Joern analyzes a code base using a robust parser for C/C++ and represents the entire code base by one large property graph stored in a Neo4J graph database. This allows code to be mined using complex queries formulated in the graph traversal languages Gremlin and Cypher.

The documentation can be found here

This looks quite useful.

Makes me curious about mapping graphs of different codebases but shared libraries together.

I found this following a tweet by Nicolas Karassas which pointed to: Hunting Vulnerabilities with Graph Databases by Fabian Yamaguchi.

March 23, 2014

How to Quickly Add Nodes and Edges…

Filed under: Authoring Topic Maps,Graphs,Interface Research/Design,Neo4j — Patrick Durusau @ 7:39 pm

How to Quickly Add Nodes and Edges to Graphs

From the webpage:

The existing interfaces for graph manipulation all suffer from the same problem: it’s very difficult to quickly enter the nodes and edges. One has to create a node, then another node, then make an edge between them. This takes a long time and is cumbersome. Besides, such approach is not really as fast as our thinking is.

We, at Nodus Labs, decided to tackle this problem using what we already do well: #hashtagging the @mentions. The basic idea is that you create the nodes and edges in something that we call a “statement”. Within this #statement you can mark the #concepts with #hashtags, which will become nodes and then mark the @contexts or @lists where you want them to appear with @mentions. This way you can create huge graphs in a matter of seconds and if you do not believe us, watch this screencast of our application below.

You can also try it online on www.infranodus.com or even install it on your local machine using our free open-source repository on http://github.com/noduslabs/infranodus.

+1! for using “…what we already do well….” for an authoring interface.

Getting any ideas for a topic map authoring interface?

Quickly create a 100k Neo4j graph data model…

Filed under: Data,Graphs,Neo4j — Patrick Durusau @ 2:54 pm

Quickly create a 100k Neo4j graph data model with Cypher only by Michael Hunger.

From the post:

We want to run some test queries on an existing graph model but have no sample data at hand and also no input files (CSV,GraphML) that would provide it.

Why not create quickly it on our own just using cypher. First I thought about using Cypher to generate CSV files and loading them back, but it is much easier.

The domain is simple (:User)-[:OWN]→(:Product) but good enough for collaborative filtering or demographic analysis.

Admittedly a “simple” domain but I’m curious how you would rank sample data?

We can all probably recognize “simple” domains but what criteria should we use to rank more complex sample data?

Suggestions?

March 20, 2014

PLUS

Filed under: Data,Neo4j,Provenance — Patrick Durusau @ 7:36 pm

PLUS

From the webpage:

PLUS is a system for capturing and managing provenance information, originally created at the MITRE Corporation.

Data provenance is “information that helps determine the derivation history of a data product…[It includes] the ancestral data product(s) from which this data product evolved, and the process of transformation of these ancestral data product(s).”

Uses Neo4j for storage.

Includes an academic bibliography of related papers.

Provenance answers the question: Where has your data been, what has happened to your data and with who?

March 19, 2014

Full-Text-Indexing (FTS) in Neo4j 2.0

Filed under: Indexing,Neo4j,Texts — Patrick Durusau @ 2:58 pm

Full-Text-Indexing (FTS) in Neo4j 2.0 by Michael Hunger.

From the post:

With Neo4j 2.0 we got automatic schema indexes based on labels and properties for exact lookups of nodes on property values.

Fulltext and other indexes (spatial, range) are on the roadmap but not addressed yet.

For fulltext indexes you still have to use legacy indexes.

As you probably don’t want to add nodes to an index manually, the existing “auto-index” mechanism should be a good fit.

To use that automatic index you have to configure the auto-index upfront to be a fulltext index and then secondly enable it in your settings.

Great coverage of full-text indexing in Neo4j 2.0.

Looking forward to spatial indexing. In the most common use case, think of it as locating assets on the ground relative to other actors. In real time.

March 17, 2014

Facebook Graph Search with Cypher and Neo4j

Filed under: Cybersecurity,Cypher,Neo4j,NSA,Security — Patrick Durusau @ 8:14 pm

Facebook Graph Search with Cypher and Neo4j by Max De Marzi.

A great post as always but it has just been updated:

Update: Facebook has disabled this application

Your app is replicating core Facebook functionality.

Rather ironic considering this headline:

Mark Zuckerberg called Obama about the NSA. Let’s not hang up the phone by Dan Gillmor.

It’s hard to say why Mark is so upset.

Here are some possible reasons:

  • NSA surveillance is poaching on surveillance sales by Facebook
  • NSA leaks exposed surveillance by Facebook
  • NSA leaks exposed U.S. corporations doing surveillance for the government
  • NSA surveillance will make consumers leery of Facebook surveillance
  • NSA leaks make everyone more aware of surveillance
  • NSA leaks make Mark waste time on phone with Obama acting indignant.

I am sure I have missed dozens of reasons why Mark is upset.

Care to supply the ones I missed?

March 12, 2014

Building a tweet ranking web app using Neo4j

Filed under: Graphs,MongoDB,Neo4j,node-js,Python,Tweets — Patrick Durusau @ 7:28 pm

Building a tweet ranking web app using Neo4j by William Lyon.

From the post:

twizzard

I spent this past weekend hunkered down in the basement of the local Elk’s club, working on a project for a hackathon. The project was a tweet ranking web application. The idea was to build a web app that would allow users to login with their Twitter account and view a modified version of their Twitter timeline that shows them tweets ranked by importance. Spending hours every day scrolling through your timeline to keep up with what’s happening in your Twitter network? No more, with Twizzard!

The project uses the following components:

  • Node.js web application (using Express framework)
  • MongoDB database for storing basic user data
  • Integration with Twitter API, allowing for Twitter authentication
  • Python script for fetching Twitter data from Twitter API
  • Neo4j graph database for storing Twitter network data
  • Neo4j unmanaged server extension, providing additional REST endpoint for querying / retrieving ranked timelines per user

Looks like a great project and good practice as well!

Curious what you think of the ranking of tweets:

How can we score Tweets to show users their most important Tweets? Users are more likely to be interested in tweets from users they are more similar to and from users they interact with the most. We can calculate metrics to represent these relationships between users, adding an inverse time decay function to ensure that the content at the top of their timeline stays fresh.

That’s one measure of “importance.” Being able to assign a rank would be useful as well, say for the British Library.

Do take notice of the Jaccard similarity index.

Would you say that possessing at least one identical string (id, subject identifier, subject indicator) is a form of similarity measure?

What other types of similarity measures do you think would be useful for topic maps?

I first saw this in a tweet by GraphemeDB.

March 4, 2014

Gephi Upgrade – Neo4j 2.0.1 Support

Filed under: Gephi,Graphs,Neo4j — Patrick Durusau @ 1:01 pm

Gephi Upgrade

From the webpage:

This plugin adds support for Neo4j graph database. You can open Neo4j 2.0.1 database directory and manipulate with graph as any other Gephi graph. You can also export any graph into Neo4j database, you can filter import or export and you can use debugging as well as lazy loading support.

That’s welcome news!

March 3, 2014

Cleaning UMLS data and Loading into Graph

Filed under: Graphs,Neo4j,UMLS — Patrick Durusau @ 9:09 pm

Cleaning UMLS data and Loading into Graph by Sujit Pal.

From the post:

The little UMLS ontology I am building needs to support two basic features in its user interface – findability and navigability. I now have a reasonable solution for the findability part, and I am planning to use Neo4j (a graph database) for the navigability part.

Just to get you interested in the post, here is the outcome:

The server exposes a Web Admin client (similar to the Solr Admin client) at port 7474 (http://localhost:7474/webadmin/). The dashboard shows 2,880,385 nodes, 3,375,083 properties, 58,021,093 relationships and 653 relationship types, which matches with what we put in. (emphasis added)

Enjoy!

February 26, 2014

Graph Gist Winter Challenge Winners

Filed under: Graphs,Neo4j — Patrick Durusau @ 8:26 pm

Graph Gist Winter Challenge Winners by Michael Hunger.

From the post:

We received 65 submissions in the 10+ categories. Well done! (emphasis in original)

There were thirty-three (33) winners across these categories:

  • Education
  • Finance
  • Life Science
  • Manufacturing
  • Resources
  • Retail
  • Telecommunication
  • Transport
  • Advanced Graph Gists
  • Other

Great place to start collecting graph patterns for solving particular data modeling issues!

February 11, 2014

Neo4j Spatial Part 2

Filed under: Geographic Data,Georeferencing,Graphs,Neo4j — Patrick Durusau @ 2:27 pm

Neo4j Spatial Part 2 by Max De Marzi.

Max finishes up part 1 with sample spatial data on for restaurants and deploying his proof of concept using GrapheneDB on Heroku.

Restaurants are typical cellphone app fare but if I were in Kiev, I’d want an app with geo-locations of ingredients for a proper Molotov cocktail.

A jar filled with gasoline and a burning rag is nearly as dangerous to the thrower as the target.

Of course, substitutions for ingredients, in what quantities, in different languages, could be added features of such an app.

Data management is a weapon within the reach of all sides.

February 5, 2014

Neo4j 2.0.1 Maintenance Release

Filed under: Graphs,Neo4j — Patrick Durusau @ 1:56 pm

Neo4j 2.0.1 Maintenance Release by Mark Needham.

From the post:

Today we’re releasing the latest version of the 2.0 series of Neo4j, version 2.0.1. For more details on Neo4j 2.0.0 see the December release blog post.

This is a maintenance release and has no new features although it contains significant stability and performance improvements.

Download. Release notes.

Take the time to review the GraphGist Challenge entries for the December contest.

While you do that, look for graph community members to follow on Twitter.

February 1, 2014

Neo4j Spatial Part 1

Filed under: Graphs,Neo4j — Patrick Durusau @ 8:58 pm

Neo4j Spatial Part 1 by Max De Marzi.

spatial

One of my new year resolutions is to do a project with Neo4j Spatial, so we’ll kick off my first blog post of the year with a gentle introduction to this awesome plugin. I advise you to watch this very short 15 minute video by Neo4j Spatial creator Craig Taverner. The man is a genius level developer, you’ll gain IQ points just listening, I swear.

Max’s layers image is haunting familiar to old time topic map hands.

This is the first of what promises to be an excellent series of posts on Neo4j Spatial.

In case you are not familiar with Neo4j Spatial:

Neo4j Spatial is a library of utilities for Neo4j that faciliates the enabling of spatial operations on data. In particular you can add spatial indexes to already located data, and perform spatial operations on the data like searching for data within specified regions or within a specified distance of a point of interest.

While you are reading the examples, recall that “spatial” in the sense of Google Maps or Open Street Map is only one sense of “spatial.”

January 26, 2014

Storing and querying RDF in Neo4j

Filed under: Graphs,Neo4j,RDF,SPARQL — Patrick Durusau @ 8:07 pm

Storing and querying RDF in Neo4j by Bob DuCharme.

From the post:

In the typical classification of NoSQL databases, the “graph” category is one that was not covered in the “NoSQL Databases for RDF: An Empirical Evaluation” paper that I described in my last blog entry. (Several were “column-oriented” databases, which I always thought sounded like triple stores—the “table” part of they way people describe these always sounded to me like a stretched metaphor designed to appeal to relational database developers.) A triplestore is a graph database, and Brazilian software developer Paulo Roberto Costa Leite has developed a SPARQL plugin for Neo4j, the most popular of the NoSQL graph databases. This gave me enough incentive to install Neo4j and play with it and the SPARQL plugin.

As Bob points out, the plugin isn’t ready for prime time but I mention it in case you are interested in yet another storage solution for RDF.

January 25, 2014

Exporting GraphML from Neo4j

Filed under: GraphML,Graphs,Neo4j — Patrick Durusau @ 3:09 pm

I created a graph database in Neo4j 2.0 directly from a Twitter stream. To get better display capabilities, I wanted to export the database for loading into Gephi using neo4j-shell-tools.

Well, the export did create an XML file. Unfortunately, not a “well-formed” one. 🙁

The first error was that the “&” character was not written with an entity. The “&” characters were in the Twitter text stream but should have been replaced upon export as XML. Michael Hunger responded quite quickly with a revision to neo4j-shell-tools to get me past that issue. (The new version also replaces < and > in the text flow. Be careful if you have markup inside processing instructions stored in a Neo4j database. Admittedly an edge case.)

A problem that remains unresolved is that the Graphml export file has a UTF-8 declaration but in fact contains high ASCII characters.

Here are four examples that are part of what I posted to the Neo4j mailing list. Each example is preceded by an XML comment about the improper character at that node.

<code><!– Node n16, see “non SGML character number 128_” immediately following “BBSeedfund”
<node id=”n16″ labels=”User” > @SBSSeedfund • Looking into …</data></node>
<!– Node n26 – “ÜT” non SGML character number 156 – special ASCII character –>
<node id=”n26″ labels=”User” ><data key=”labels”>…<data key=”location”>ÜT: 51.450038,6.802151</data>…</node>
<!– Node n35 – ≠ non SGML character number 137 –>
<node id=”n35″ labels=”User” >… RT ≠ endorsement</data>…</node>
<!– Node n58 – ™ non SGML character number 132 –>
<node id=”n58″ labels=”User” >CONFERENCE™ is the …</data></node>
</code>

One solution is to parse the file in an XML editor and with save/replace to eliminate the offending characters.

A better solution is to grab a copy of HTML Tidy for HTML5 (experimental) and use it to eliminate the high ASCII characters.

HTML Tidy converts high ASCII into entities so you will have some odd looking display text.

I used a config.txt file with the following settings:

input-encoding: ascii
output-xml: yes
input-xml: yes
show-warnings: yes
numeric-entities: yes

I set input-encoding: ascii because the UTF-8 encoding declaration from Neo4j isn’t correct. And with that setting, HTML Tidy automatically replaces high ASCII with entities.

Made the file acceptable to Gephi.

While I understand Neo4j being liberal in terms of what it accepts for input, it needs to work on exporting well-formed XML.

Using Neo4J for Website Analytics

Filed under: Graphs,Modeling,Neo4j,Web Analytics — Patrick Durusau @ 2:23 pm

Using Neo4J for Website Analytics by Francesco Gallarotti.

From the post:

Working at the office customizing and installing different content management systems (CMS) for some of our clients, I have seen different ways of tracking users and then using the collected data to:

  1. generate analytics reports
  2. personalize content

I am not talking about simple Google Analytics data. I am referring to ways to map users into predefined personas and then modify the content of the site based on what that persona is interested into.

Interesting discussion of tracking users for web analytics with a graph database.

Not NSA grade tracking because users are collapsed into predefined personas. Personas limit the granularity of your tracking.

On the other hand, if that is all the granularity that is required, personas allow you to avoid a lot of “merge” statements that test for the prior existence of a user in the graph.

Depending on the circumstances, I would create new nodes for each visit by a user, reasoning it is quicker to stream the data and later combine for specific users, if desired. Defining “personas” on the fly from the pages visited and ignoring the individual users.

Thinking I can always ignore granularity I don’t need but once lost, granularity is forever lost.

January 22, 2014

Optimizing Cypher Queries in Neo4j

Filed under: Cypher,Graphs,Neo4j — Patrick Durusau @ 5:33 pm

Optimizing Cypher Queries in Neo4j by Mark Needham and Wes Freeman.

Thursday January 23 10:00 PST / 19:00 CET

Description:

Mark and Wes will talk about Cypher optimization techniques based on real queries as well as the theoretical underlying processes. They’ll start from the basics of “what not to do”, and how to take advantage of indexes, and continue to the subtle ways of ordering MATCH/WHERE/WITH clauses for optimal performance as of the 2.0.0 release.

OK, I’m registered. But this is at 7 AM on the East Coast of the US. I will bring my own coffee but have high expectations. Just saying. 😉

Correction: East Coast today at 1:00 P.M. local. I’m not a very good clock. 😉

January 19, 2014

Importing data to Neo4j…

Filed under: Graphs,Neo4j — Patrick Durusau @ 4:48 pm

Importing data to Neo4j the spreadsheet way in Neo4j 2.0! by Pernilla.

From the post:

And happy new year! I hope you had an excellent start, let’s keep this year rocking with a spirit of graph-love! Our Rik Van Bruggen did a lovely blog post on how to import data into Neo4j using spreadsheets in March last year. Simple and easy to understand but only for Neo4j version 1.9.3. Now it’s a new year and in December we launched a shiny new version of Neo4j, the 2.0.0 release! Baadadadaam! So, I thought better provide an update to his blogpost, with the spirit of his work. (Thank you Rik!)

If you don’t think spreadsheets are all that weird in data processing, ;-), you should feel right at home.

January 14, 2014

Online Training: Getting Started with Neo4j

Filed under: Graphs,Neo4j — Patrick Durusau @ 5:59 pm

Online Training: Getting Started with Neo4j

From the webpage:

Course Description: Getting Started with Neo4j

You’re beginning with Neo4j? Invest 4 hours of interactive, engaging learning to get familiar with Neo4j. With this online course you can control your progress at your own leisure and pause and resume at any time.

Audience

  • Developers, System Administrators, DevOps engineers, DBAs, Business Analysts, CTOs, CIOs, and students.
  • Also, we invite anyone who is interested in getting an overview of graph databases and Neo4j.

Skills taught

  • An understanding of graph databases
  • How to use graph databases
  • Introduction to data modeling with Graph databases
  • How to get started working with Neo4j

Free, well, you have to become a marketing lead, but free otherwise, online training on Neo4j.

I never have understood the marketing lead approach to “free” training, white papers, etc.

Reminds me of a local church that sponsored a “safe” Halloween with games and candy for children. Until they realized the resulting number of children enrolling at their church wasn’t high enough. So they stopped giving out candy at Halloween.

Quality products attract customers. Promise.

January 8, 2014

BIIIG:…

Filed under: BI,Graphs,Neo4j,Networks — Patrick Durusau @ 8:03 pm

BIIIG : Enabling Business Intelligence with Integrated Instance Graphs by André Petermann, Martin Junghanns, Robert Müller, Erhard Rahm.

Abstract:

We propose a new graph-based framework for business intelligence called BIIIG supporting the flexible evaluation of relationships between data instances. It builds on the broad availability of interconnected objects in existing business information systems. Our approach extracts such interconnected data from multiple sources and integrates them into an integrated instance graph. To support specific analytic goals, we extract subgraphs from this integrated instance graph representing executed business activities with all their data traces and involved master data. We provide an overview of the BIIIG approach and describe its main steps. We also present initial results from an evaluation with real ERP data.

Very interesting paper because on one hand it talks about merging data from heterogeneous data sets and at the same time claims to be using Neo4j.

In case you didn’t know, Neo4j enforces normalization and doesn’t have a concept of merging nodes. (True, Cypher has a “merge” operator but it doesn’t “merge” nodes in any meaningful sense of the word. Either a node is matched or a new node is created. Not how I interpret “merge.”)

It took more than one read but in puzzling over:

For integrated objects we can merge the properties from the sources. For the example in Fig. 2, we can combine employees objects with CIT.employees.erp_empl_number = ERP.EmplyeeTable.number and merge their properties from both sources (name, degree, dob, address, phone).

I realized the authors were producing a series of graphs where only the final version of the graph has the “merged” nodes. If you notice, the nodes are created first and then populated with associations, which resolves the question of using different pointers from the original sources.

The authors also point out that Neo4j cannot manage sets of graphs. I had overlooked that point. That is a fairly severe limitation.

Do spend some time at the Database Group Leipzig. There are several other recent papers that look very interesting.

« Newer PostsOlder Posts »

Powered by WordPress