Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 25, 2014

Exporting GraphML from Neo4j

Filed under: GraphML,Graphs,Neo4j — Patrick Durusau @ 3:09 pm

I created a graph database in Neo4j 2.0 directly from a Twitter stream. To get better display capabilities, I wanted to export the database for loading into Gephi using neo4j-shell-tools.

Well, the export did create an XML file. Unfortunately, not a “well-formed” one. 🙁

The first error was that the “&” character was not written with an entity. The “&” characters were in the Twitter text stream but should have been replaced upon export as XML. Michael Hunger responded quite quickly with a revision to neo4j-shell-tools to get me past that issue. (The new version also replaces < and > in the text flow. Be careful if you have markup inside processing instructions stored in a Neo4j database. Admittedly an edge case.)

A problem that remains unresolved is that the Graphml export file has a UTF-8 declaration but in fact contains high ASCII characters.

Here are four examples that are part of what I posted to the Neo4j mailing list. Each example is preceded by an XML comment about the improper character at that node.

<code><!– Node n16, see “non SGML character number 128_” immediately following “BBSeedfund”
<node id=”n16″ labels=”User” > @SBSSeedfund • Looking into …</data></node>
<!– Node n26 – “ÜT” non SGML character number 156 – special ASCII character –>
<node id=”n26″ labels=”User” ><data key=”labels”>…<data key=”location”>ÜT: 51.450038,6.802151</data>…</node>
<!– Node n35 – ≠ non SGML character number 137 –>
<node id=”n35″ labels=”User” >… RT ≠ endorsement</data>…</node>
<!– Node n58 – ™ non SGML character number 132 –>
<node id=”n58″ labels=”User” >CONFERENCE™ is the …</data></node>
</code>

One solution is to parse the file in an XML editor and with save/replace to eliminate the offending characters.

A better solution is to grab a copy of HTML Tidy for HTML5 (experimental) and use it to eliminate the high ASCII characters.

HTML Tidy converts high ASCII into entities so you will have some odd looking display text.

I used a config.txt file with the following settings:

input-encoding: ascii
output-xml: yes
input-xml: yes
show-warnings: yes
numeric-entities: yes

I set input-encoding: ascii because the UTF-8 encoding declaration from Neo4j isn’t correct. And with that setting, HTML Tidy automatically replaces high ASCII with entities.

Made the file acceptable to Gephi.

While I understand Neo4j being liberal in terms of what it accepts for input, it needs to work on exporting well-formed XML.

May 14, 2012

Mining GitHub – Followers in Tinkerpop

Filed under: Github,GraphML,Neo4j,R,TinkerPop — Patrick Durusau @ 6:13 pm

Mining GitHub – Followers in Tinkerpop

Patrick Wagstrom writes:

Development of any moderately complex software package is a social process. Even if a project is developed entirely by a single person, there is still a social component that consists of all of the people who use the software, file bugs, and provide recommendations for enhancements. This social aspect is one of the driving forces behind the proliferation of social software development sites such as GitHub, SourceForge, Google Code, and BitBucket.

These sites combine together a variety of tools that are common for software development such as version control, bug trackers, mailing lists, release management, project planning, and wikis. In addition, some of these have more social aspects that allow you find and follow individual developers or watch particular projects. In this post I’m going to show you how we can use some this information to gain insight into a software development community, specifically the community around the Tinkerpop stack of tools for graph databases.

GitHub as a social community. Who knew? 😉

Very instructive walk through Gremlin, GraphML, and R with a prepared data set. It doesn’t get much better than this!

March 30, 2012

GraphMLViewer

Filed under: GraphML,Graphs — Patrick Durusau @ 4:35 pm

GraphMLViewer

From the webpage:

GraphMLViewer is a freely available Flash®-based viewer which can display diagrams, networks, and other graph-like structures in HTML web pages. It is optimized for diagrams which were created with the freely available yEd graph editor.

GraphML Specification

Filed under: GraphML,Graphs — Patrick Durusau @ 4:35 pm

GraphML Specification

GraphML is a comprehensive and easy-to-use file format for graphs. It consists of a language core to describe the structural properties of a graph and a flexible extension mechanism to add application-specific data. Its main features include support of

  • directed, undirected, and mixed graphs,
  • hypergraphs,
  • hierarchical graphs,
  • graphical representations,
  • references to external data,
  • application-specific attribute data, and
  • light-weight parsers.

Unlike many other file formats for graphs, GraphML does not use a custom syntax. Instead, it is based on XML and hence ideally suited as a common denominator for all kinds of services generating, archiving, or processing graphs.

Interchange syntax for graphs in XML.

GraphML Primer

Filed under: GraphML,Graphs — Patrick Durusau @ 4:35 pm

GraphML Primer

Abstract:

GraphML Primer is a non-normative document intended to provide an easily readable description of the GraphML facilities, and is oriented towards quickly understanding how to create GraphML documents. This primer describes the language features through examples which are complemented by references to normative texts.

Nice introduction to GraphML.

Powered by WordPress