Archive for the ‘TinkerPop’ Category

Amazon Neptune (graph database, preview)

Wednesday, November 29th, 2017

Amazon Neptune

From the webpage:

Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. The core of Amazon Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with milliseconds latency. Amazon Neptune supports popular graph models Property Graph and W3C’s RDF, and their respective query languages Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune powers graph use cases such as recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security.

Amazon Neptune is highly available, with read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across Availability Zones. Neptune is secure, with support for encryption at rest and in transit. Neptune is fully-managed, so you no longer need to worry about database management tasks such as hardware provisioning, software patching, setup, configuration, or backups.

Sign up for the Amazon Neptune preview here.

I’m skipping the rest of the graph/Amazon promotional material because if you are interested, you know enough about graphs to be bored by the repetition.

Interested in know your comments on:

Amazon Neptune provides multiple levels of security for your database, including network isolation using Amazon VPC, encryption at rest using keys you create and control through AWS Key Management Service (KMS), and encryption of data in transit using TLS. On an encrypted Neptune instance, data in the underlying storage is encrypted, as are the automated backups, snapshots, and replicas in the same cluster.


You are placing a great deal of trust in Amazon. Yes?

GraphSON and TinkerPop systems

Tuesday, August 8th, 2017

Tips for working with GraphSON and TinkerPop systems by Noah Burrell.

From the post:

If you are working with the Apache TinkerPop™ framework for graph computing, you might want to produce, edit, and save graphs, or parts of graphs, outside the graph database. To accomplish this, you might want a standardized format for a graph representation that is both machine- and human-readable. You might want features for easily moving between that format and the graph database itself. You might want to consider using GraphSON.

GraphSON is a JSON-based representation for graphs. It is especially useful to store graphs that are going to be used with TinkerPop™ systems, because Gremlin (the query language for TinkerPopTM graphs) has a GraphSON Reader/Writer that can be used for bulk upload and download in the Gremlin console. Gremlin also has a Reader/Writer for GraphML (XML-based) and Gryo (Kryo-based).

Unfortunately, I could not find any sort of standardized documentation for GraphSON, so I decided to compile a summary of my research into a single document that would help answer all the questions I had when I started working with it.

Bookmark or better yet, copy-n-paste “Vertex Rules and Conventions” to print on one page and then print “Edge Rules and Conventions” on the other.

Could possibly get both on one page but I like larger font sizes. 😉

Type in the “Example GraphSON Structure” to develop finger knowledge of the format.

Watch for future posts from Noah Burrell. This is useful.

JanusGraph (Linux Foundation Graph Player Rides Into Town)

Wednesday, February 22nd, 2017


From the homepage:

JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.
JanusGraph is a transactional database that can support thousands of concurrent users executing complex graph traversals in real time.

In addition, JanusGraph provides the following features:

You can clone JanusGraph from GitHub.
Read the JanusGraph documentation and join the users or developers mailing lists.

Follow the Getting Started with JanusGraph guide for a step-by-step introduction.

Supported by Google, IBM and Hortonworks, among others.

Three good reasons to pay attention to JanusGraph early and often.


Visualizing your Titan graph database:…

Friday, June 17th, 2016

Visualizing your Titan graph database: An update by Marco Liberati.

From the post:

Last summer, we wrote a blog with our five simple steps to visualizing your Titan graph database with KeyLines. Since then TinkerPop has emerged from the Apache Incubator program with TinkerPop3, and the Titan team have released v1.0 of their graph database:

  • TinkerPop3 is the latest major reincarnation of the graph proje­­­ct, pulling together the multiple ventures into a single united ecosystem.
  • Titan 1.0 is the first stable release of the Titan graph database, based on the TinkerPop3 stack.

We thought it was about time we updated our five-step process, so here’s:

Not exactly five (5) steps because you have to acquire a KeyLines trial key, etc.

A great endorsement of much improved installation process for TinkerPop3 and Titan 1.0.


Incubate No Longer! Tinkerpop™!

Monday, May 23rd, 2016

The Apache Software Foundation Announces Apache® TinkerPop™ as a Top-Level Project

From the post:

The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, announced today that Apache® TinkerPop™ has graduated from the Apache Incubator to become a Top-Level Project (TLP), signifying that the project’s community and products have been well-governed under the ASF’s meritocratic process and principles.

Apache TinkerPop is a graph computing framework that provides developers the tools required to build modern graph applications in any application domain and at any scale.

“Graph databases and mainstream interest in graph applications have seen tremendous growth in recent years,” said Stephen Mallette, Vice President of Apache TinkerPop. “Since its inception in 2009, TinkerPop has been helping to promote that growth with its Open Source graph technology stack. We are excited to now do this same work as a top-level project within the Apache Software Foundation.”

As a graph computing framework for both real-time, transactional graph databases (OLTP) and and batch analytic graph processors (OLAP), TinkerPop is useful for working with small graphs that fit within the confines of a single machine, as well as massive graphs that can only exist partitioned and distributed across a multi-machine compute cluster.

TinkerPop unifies these highly varied graph system models, giving developers less to learn, faster time to development, and less risk associated with both scaling their system and avoiding vendor lock-in.

In addition to that good news, the announcement also answers the inevitable question about scaling:

Apache TinkerPop is in use at organizations such as DataStax and IBM, among many others. is currently using TinkerPop and Gremlin to process its order fullfillment graph which contains approximately one trillion edges. (emphasis added)

A trillion edges, unless you are a stealth Amazon, Tinkerpop™ will scale for you.

Congratulations to the Tinkerpop™ community!

MOOGI – The Film Discovery Engine

Wednesday, May 11th, 2016

MOOGI – The Film Discovery Engine

Not the most recent movie I have seen but under genre I entered:

movies about B.C.

Thinking that it would return (rather quickly):

One Million Years B.C. (1966)

Possibly just load on this alpha site but after a couple of minutes, I just reloaded the homepage.

Using “keyword,” just typing “B.C.” brought up a pick list where One Million Years B.C. (1966) was eight in the list. Without any visible delay.

The keyword categories are interesting and many.

Learned a new word, canuxploitation! There is an entire site devoted to Canadian B-movies, i.e., Canuxploitation! – Your Complete Guide to Canadian B-Film.

You will recognize most of the other keywords.

If not, check the New York Times or the Washington Post and include the term plus “member of congress.” You will get several stories that will flesh out the meaning of “erotic,” “female nudity,” “drugs,” “prostitution,” “monster,” “hotel,” “adultery” and the like.

If search isn’t your strong point, try the “explore” option. You can search for movies “similar to” some named movie.

Just for grins, I typed in:

The Dirty Dozen. When I saw it during its first release, it had been given a “condemned” rating by Catholic movie rating service. Had no redeeming qualities at all. No one should see it.

I miss those lists because they were great guides to what movies to go see! 😉

One of five (5) results was The Dirty Dozen: The Deadly Mission (1987).

When I chose that movie, the system failed so I closed out the window and tried again. Previous quick response is taking a good bit of time, suspect load/alpha quality. (I will revisit fairly soon and update this report.)

In terms of aesthetics, they really should lose the hand in the background moving around with a remote control. Adds nothing to the experience other than annoyance.

The site is powered by Mindmaps. Which means you are going to find Apache Tinkerpop under the hood.


Nine Inch Gremlins

Saturday, April 23rd, 2016

Nine Inch Gremlins


Stephen Mallette writes:

On the back of TinkerPop 3.1.2-incubating comes TinkerPop 3.2.0-incubating. Yes – a dual release – an unprecedented and daring move you’ve come to expect and not expect from the TinkerPop clan! Be sure to review the upgrade documentation in full as you may find some changes that introduce some incompatibilities.

The release artifacts can be found at this location:

The online docs can be found here: (user docs) (upgrade docs) (core javadoc) (full javadoc)

The release notes are available here:

The Central Maven repo has sync’d as well:

Another impressive release!

In reading the documentation I discovered that Ketrina Yim is responsible for drawing Gremlin and his TinkerPop friends.

I was relieved to find that Marko was only responsible for the Gremlin/TinkerPop code/prose and not the graphics as well. That would be too much talent for any one person! 😉


Planet TinkerPop [+ 2 New Graph Journals]

Tuesday, April 12th, 2016

Planet TinkerPop

From the webpage:

Planet TinkerPop is a vendor-agnostic, community-driven site aimed at advancing graph technology in general and Apache TinkerPop™ in particular. Graph technology is used to manage, query, and analyze complex information topologies composed of numerous heterogenous relationships and is currently benefiting companies such as Amazon, Google, and Facebook. For all companies to ultimately adopt graph technology, vendor-agnostic graph standards and graph knowledge must be promulgated. For the former, TinkerPop serves as an Apache Software Foundation governed community that develops a standard graph data model (the property graph) and query language (Gremlin). Apache TinkerPop is a widely supported graph computing framework that has been adopted by leading graph system vendors and interfaced with by numerous graph-based applications across various industries. For educating the public on graphs, Planet TinkerPop’s Technology journal publishes articles about TinkerPop-related graph research and development. The Use Cases journal promotes articles on the industrial use of graphs and TinkerPop. The articles are contributed by members of the Apache TinkerPop community and additional contributions are welcomed and strongly encouraged. We hope you enjoy your time learning about graphs here at Planet TinkerPop.

If you are reading about Planet TinkerPop I can skip the usual “graphs are…” introductory comments. 😉

Planet TinkerPop is a welcome addition to the online resources on graphs in general and TinkerPop in particular.

So they aren’t buried in the prose, let me highlight two new journals at Planet TinkerPop:

TinkerPop Technology journal  publishes articles about TinkerPop-related graph research and development.

TinkerPop Use Cases journal  promotes articles on the industrial use of graphs and TinkerPop.

Both are awaiting your contributions!


PS: I prepended “TinkerPop” to the journal names and suggest an ISSN ( would be appropriate for both journals.

A Practical Guide to Graph Databases

Wednesday, January 20th, 2016

A Practical Guide to Graph Databases by Matthias Broecheler.

Slides from Graph Day 2016 @ Austin.

If you notice any of the “trash talking” on social media about graphs and graph databases, you will find slide 15 quite amusing.

Not everyone agrees on the relative position of graph products. 😉

I haven’t seen a video of Matthias’ presentation. If you happen across one, give me a ping. Thanks!

You Up To Improving Traversals/DSLs/OLAP in TinkerPop 3.2.0?

Wednesday, January 13th, 2016

Big ideas for Traversals/DSLs/OLAP in TinkerPop 3.2.0 by Marko A. Rodriguez.

Marko posted a not earlier today that reads in part:

There is currently no active development on TinkerPop 3.2.0, however, in my spare time I’ve been developing (on paper) some new ideas that should make traversals, DSLs, and OLAP even better.

Problem #1: The Builder pattern for TraversalSources is lame. []

Problem #2: It is not natural going from OLTP to OLAP to OLTP to OLAP. []

I mention this because it has been almost seven (7) hours since Marko posted this note and its not like he is covered up with responses!

Myself included but I’m not qualified to comment on his new ideas. One or more of you are. Take up the challenge!

TinkerPop, the community and you will be better for it.


Titan Graph DB Performance Tips

Saturday, October 10th, 2015

Titan Graph DB Performance Tips

From the post:

In Hawkular Inventory, we use the Tinkerpop API (version 2 for the time being) to store our inventory model in a graph database. We chose Titan as the storage engine configured to store the data in the Cassandra cluster that is also backing Hawkular Metrics and Alerts. This blog post will guide you through some performance-related lessons with Titan that we learned so far.

Inventory is under heavy development with a lot of redesign and refactoring going on between releases so we took quite a naive approach to storing and querying data from the graph database. That is, we store entities from our model as vertices in the graph and the relationships between the entities as edges in the graph. Quite simple and a school book example of how it should look like.

We did declare a couple of indices in the database on the read-only aspects of the vertices (i.e. a “type” of the entity the vertex corresponds to) but we actually didn’t pay too much attention to the performance. We wanted to have the model right first.

Fast forward a couple of months and of course, the performance started to be a real problem. The Hawkular agent for Wildfly is inserting a non-trivial amount of entities and not only inserting them but also querying them has seen a huge performance degradation compared to the simple examples we were unit testing with (due to number of vertices and edges stored).

The time has come to think about how to squeeze some performance out of Titan as well as how to store the data and query it more intelligently.

Several performance tips but the one that caught my eye and resulted in an order of magnitude performance gain:

3. Mirror Properties on The Edges

This is the single most important optimization we’ve done so far. The rationale is this. To jump from a vertex to another vertex over an edge is a fairly expensive operation. Titan uses the adjacency lists to store the vertices and their edges in wide rows in Cassandra. It uses another adjacency list for edges and their target vertices.

So to go from vertex to vertex, Titan actually has to do 2 queries. It would be much easier if we could avoid that at least in some cases.

The solution here is to copy the values of some (frequently used and, in our case, immutable) properties from the “source” and “target” vertices directly to the edges. This helps especially in the cases where you do some kind of filtering on the target vertices that you instead can do directly on the edges. If there is a high number of edges to go through, this helps tremendously because you greatly reduce the number of times you have to do the “second hop” to the target vertex.

I am curious what is being stored on the vertex that requires a second search to jump to the target vertex?

That is if you have moved “popular” vertex properties to the edge, why not move other properties of the node there?


TinkerPop3 Promo in One Paragraph

Saturday, August 22nd, 2015

Marko A. Rodriguez tweeted as a “single paragraph” explanation of why you should prefer TinkerPop3 over TinkerPop2.

Of course, I didn’t believe the advantages could be contained in a single paragraph but you be the judge:

Check out Apache TinkerPop 3.0.0 was released in June 2015 and it is a quantum leap forward. Not only is it now apart of the Apache Software Foundation, but the Gremlin3 query language has advanced significantly since Gremlin2. The language is much cleaner, provides declarative graph pattern matching constructs, and it supports both OLTP graph databases (e.g. Titan, Neo4j, OrientDB) and OLAP graph processors (e.g. Spark, Giraph). With most every graph vendor providing TinkerPop-connectivity, this should make it easier for developers as they don’t have to learn a new query language for each graph system and developers are less prone to experience vendor lock-in as their code (like JDBC/SQL) can just move to another underlying graph system.

Are my choices developer lock-in versus vendor lock-in? That’s a tough call. 😉

Do check out TinkerPop3!


Wednesday, July 8th, 2015

TinkerPop3: Taking graph databases and graph analytics to the next level by Matthias Broecheler.


Apache TinkerPop is an open source graph computing framework which includes the graph traversal language Gremlin and a number of graph utilities that speed up the development of graph based applications. Apache TinkerPop provides an abstraction layer on top of popular graph databases like Titan, OrientDB, and Neo4j as well as scalable computation frameworks like Hadoop and Spark allowing developers to build graph applications that run on multiple platforms avoiding vendor lock-in.

This talk gives an overview of the new features introduced in TinkerPop3 with a deep-dive into query language design, query optimization, and the convergence of OLTP and OLAP in graph processing. A demonstration of TinkerPop3 with the scalable Titan graph database illustrates how these concepts work in practice.

It’s not all the information you will need about TinkerPop3 but should be enough to get you interested in learning more, a lot more.

I had a conversation recently on how to process topic maps with graphs, at least if you were willing to abandon the side-effects detailed in the Topic Maps Data Model (TMDM). More on that to follow.

Titan 0.9.0-M2 Release

Tuesday, June 9th, 2015

Titan 0.9.0-M2 Release.

From Dan LaRocque:

Aurelius is pleased to release Titan 0.9.0-M2. 0.9.0-M2 is an experimental release intended for development use.

This release uses TinkerPop 3.0.0.M9-incubating, compared with 3.0.0.M6 in Titan 0.9.0-M1. Source written against Titan 0.5.x and earlier will generally require modification to compile against Titan 0.9.0-M2. As TinkerPop 3 requires a Java 8 runtime, so too does Titan 0.9.0-M2.

While 0.9.0-M1 came out with a separate console and server zip archive, 0.9.0-M2 is a single zipfile with both components. The zipfile is still only packaged with Hadoop 1 to match TP3’s Hadoop support.



The upgrade instructions and changelog for 0.9.0-M2 are in the usual places.

I have to limit my reading of people who pretend that C/C+ level hacks (OPM) are “…the work of most sophisticated state-sponsored cyber intrusion entities.”


Marko Reassures the TinkerPop Community

Tuesday, February 3rd, 2015

How the DataStax Acquisition of Aurelius Will Effect TinkerPop by Marko A. Rodriguez.

From the post:

As you may already know, Aurelius has been acquired by DataStax — Aurelius is the graph computing company behind Titan which also provides a good number of contributors to TinkerPop. DataStax is the distributed database company behind Cassandra. Matthias and I are very excited about this acquisition. With DataStax’s resources, the graph community is going to see a powerful, rock-solid, Titan-inspired distributed graph database in the near future — enterprise support, focused large-team development, and 1000+ node cluster testing for each release. A great thing for the graph community, indeed. Also, a great thing for TinkerPop —

Looking forward to seeing how a pairing of DataStax resources and Marko’s vision for graph computing expresses itself. This could be a lot of fun!

TinkerPop is moving to Apache (Incubator)

Sunday, January 18th, 2015

TinkerPop is moving to Apache (Incubator) by Marko A. Rodriguez.

From the post:

Over the last (almost) year, we have been working to get TinkerPop into a recognized software foundation — with our eyes primarily on The Apache Software Foundation. This morning, the voting was complete and TinkerPop will become an Apache Incubator project on Tuesday January 16th.

The primary intention of this move to Apache was to:

  1. Further guarantee vendor neutrality and vendor uptake.
  2. Better secure our developers and users legally.
  3. Grow our developer and user base.

I hope people see this as a positive and will bear with us as we go through the process of migrating our infrastructure over the month of February. Note that we will be doing our 3.0.0.M7 release on Monday (Jan 15th) with it being the last TinkerPop release. The next one (M8 or GA) will be an Apache release. Finally, note that we will be keeping this mailing list with a mirror being on Apache’s servers (that was a hard won battle :).

Take care and thank you for using of our software, The TinkerPop.


So long as Marko keeps doing cool graphics, it’s fine by me. 😉

More seriously increasing visibility can’t help but drive TinkerPop to new heights. Or for graph software, would that be to new connections?

The Path Forward (Titan 1.0 and TinkerPop 3.0)

Tuesday, December 9th, 2014

The Path Forward by Marko Rodriguez.

A good overview of Titan 1.0 and TinkerPop 3.0. Marko always makes great slides.

I appreciate mythology as an example but it would be nice to see an example of Titan/TinkerPop used in anger.

With the limitation that the data be legally accessible (sorry) what would you suggest as a great example of using Titan/TinkerPop?

Since everyone likes mobile phone apps, I would suggest one that displays a street map and as you pass street addresses, it lights up the address as blue or red depending on their political contributions. Brighter colors for larger donations.

I think that would prove to be very popular.

Would that be a good example for Titan/TinkerPop?

What’s yours?

TinkerPop 3.0.0.M6 Released — A Gremlin Rāga in 7/16 Time

Tuesday, December 2nd, 2014

TinkerPop 3.0.0.M6 Released — A Gremlin Rāga in 7/16 Time by Marko A. Rodriguez.

From post:

Dear ladies and gentlemen of the TinkerPop,

TinkerPop productions, in association with Gremlin Studios, presents a Gremlin-Users codebase, featuring TinkerPop-Contributors…TinkerPop 3.0.0.M6. Staring, Gremlin as himself.




Gremlin Console:
Gremlin Server:

If you want a better sense of graphs than “Everything is a graph!” type promotionals, see: How Whitepages turned the phone book into a graph using Titan and Cassandra. BTW, the Whitepages offer an API for email verification.

Don’t be the last one to submit a bug for this milestone release!

At the same time, checkout the Whitepages API.

TinkerPop 3.0.0.M4 Released (A Gremlin Rāga in 7/16 Time)

Tuesday, October 21st, 2014

TinkerPop 3.0.0.M4 Released (A Gremlin Rāga in 7/16 Time) by Marko Rodriguez.

From the post:

TinkerPop ( is happy to announce the release of TinkerPop 3.0.0.M4.



User Documentation:
Core JavaDoc: [user javadocs]
Full JavaDoc : [vendor javadocs]


Gremlin Console:
Gremlin Server:

There were lots of updates in this release — with a lot of valuable feedback provided by Titan (Matthias), Titan-Hadoop (Dan), FoundationDB (Mike), PostgreSQL-Gremlin (Pieter), and Gremlin-Scala (Mike).

We are very close to a GA. We think that either there will be a “minor M5” or the next release will be GA. Why the delay? We are currently working closely with the Titan team to see if there are any problems in our interfaces/test-suites/etc. The benefit of working with the Titan team is that they are doing both OLTP and OLAP so are covering the full gamut of the TinkerPop3 API. Of course, we have had lots of experience with these APIs for both Neo4j (OTLP) and Giraph (OLAP), but to see it standup to yet another vendor’s requirements will be a confidence boost for GA. If you are vendor, please feel free to join the conversation as your input is crucial to making sure GA meets everyone’s needs.

A few important notes for users:
1. The GremlinKryo serialization format is not guaranteed to be stable from MX to MY. By GA it will be locked.
2. Neo4j-Gremlin’s disk representation is not guaranteed to be stable from MX to MY. By GA it will be locked.
3. Giraph-Gremlin’s Hadoop Writable specification is not guaranteed to be stable from MX to MY. By GA it will be locked.
4. VertexProgram, Memory, Step, SideEffects, etc. hidden and system labels may change between MX and MY. By GA they will be locked.
5. Package and class names might change from MX to MY. By GA they will be locked.

Thank you everyone. Please play and provide feedback. This is the time to get your ideas into TinkerPop3 as once it goes GA, sweeping changes are going to be more difficult.

Vertex Meta- and Multi-Properties in TinkerPop3

Monday, October 6th, 2014

Marko Rodriguez tweeted this link which takes you to a diagram and REPL work that demonstrates vertex meta- and multiproperties in TinkerPop3.

If you haven’t looked at the TinkerPop documentation in a while, make the time to do so.

TinkerPop 3.0.0.M3 Released (A Gremlin Rāga in 7/16 Time)

Monday, October 6th, 2014

TinkerPop 3.0.0.M3 Released (A Gremlin Rāga in 7/16 Time) by Marko Rodriguez.

From the post:

TinkerPop 3.0.0.M3 has been released. This release has numerous core bug-fixes/optimizations/features. We were anxious to release M3 due to some changes in the Process API. These changes should not effect the user, only vendors providing a Gremlin language variant (e.g. Gremlin-Scala, Gremlin-JavaScript, etc.). From what I hear, it “just worked” for Gremlin-Scala so that is good. Here are links to the release:

– Gremlin-Console:
– Gremlin-Server:

Are you going to accept Marko’s anecdotal assurances, it “just worked” for Gremlin-Scala or will you put this release to the test? 😉

I am sure Marko and others would like to know!

TinkerPop3 M2 Delay for MetaProperties

Wednesday, September 10th, 2014

TinkerPop3 M2 Delay for MetaProperties by Marko A. Rodreiguez.

From the post:

TinkerPop3 3.0.0.M2 was suppose to be released 1.5 weeks ago. We have delayed the release because we have now introduced MetaProperties into TinkerPop3. Matthias Bröcheler of Titan-fame has been pushing TinkerPop to provide this feature for over a year now. We had numerous discussions about it over the past year, and at one point, rejected the feature request. However, recently, a solid design proposal was presented by Matthias and Stephen and I went about implementing it over the last 1.5 weeks. With that said, TinkerPop3 now has MetaProperties.

What are meta-properties?

  1. Edges have Properties
  2. Vertices have MetaProperties
  3. MetaProperties have Properties

What are the consequences of meta-properties?

  1. A vertex can have multiple “name” properties (for example).
  2. A vertex’s properties (i.e. meta-properties) can have normal key/value properties (e.g. a “name” property can have an “acl:public” property).

What are the use cases?

  1. Provenance: different users have different declarations for Marko’s name: “marko”, “marko rodriguez,” “marko a. rodriguez.”
  2. Security: you can now do property-level security. Marko’s “age” has an acl:private property and his “name”(s) have acl:public properties.
  3. History: who mutated what and when did they do it? each vertex property can have a “creator:stephen” and a “createdAt:2014” property.

If you have ever had to build a graph application that required provenance, security, history, and the like, you realized how difficult it is with the current key/value property graph model. You end up, in essence, creating vertices for properties so you can express such higher order semantics. However, maintaing that becomes a nightmare as tools like Gremlin and GraphWrappers don’t know the semantics and you basically are left to create your own GremlinDSL-extensions and tools to process such a custom representation. Well now, you get it for free and TinkerPop will be able to provide (in the future) wrappers (called strategies in TP3) for provenance, security, history, etc.

I don’t grok the reason for a distinction between properties of vertices and properties of edges so I have posted a note asking about it.

Take the quoted portion as a sample of the quality of work being done on TinkerPop3.

TinkerPop3 3.0.0.M1

Tuesday, August 12th, 2014

TinkerPop3 3.0.0.M1 Released — A Gremlin Raga in 7/16 Time by Marko A. Rodriguez.

From the post:

TinkerPop3 3.0.0.M1 “A Gremlin Rāga in 7/16 Time” is now released and ready for use. (downloads and docs) (changelog)

IMPORTANT: TinkerPop3 requires Java8.

We would like both developers and vendors to play with this release and provide feedback as we move forward towards M2, …, then GA.

  1. Is the API how you like it?
  2. Is it easy to implement the interfaces for your graph engine?
  3. Is the documentation clear?
  4. Are there VertexProgram algorithms that you would like to have?
  5. Are there Gremlin steps that you would like to have?
  6. etc…

For the above, as well as for bugs, the issue tracker is open and ready for submissions:

TinkerPop3 is the culmination of a huge effort from numerous individuals. You can see the developers and vendors that have provided their support through the years.
(the documentation may take time to load due to all the graphics in the single HTML)

If you haven’t looked at the TinkerPop3 docs in a while, take a quick look. Tweets on several sections have recently pointed out very nice documentation.

Multiobjective Search

Monday, August 11th, 2014

Multiobjective Search with Hipster and TinkerPop Blueprints

From the webpage:

This advanced example explains how to perform a general multiobjective search with Hipster over a property graph using the TinkerPop Blueprints API. In a multiobjective problem, instead of optimizing just a single objective function, there are many objective functions that can conflict each other. The goal then is to find all possible solutions that are nondominated, i.e., there is no other feasible solution better than the current one in some objective function without worsening some of the other objective functions.

If you don’t know Hipster:

The aim of Hipster is to provide an easy to use yet powerful and flexible type-safe Java library for heuristic search. Hipster relies on a flexible model with generic operators that allow you to reuse and change the behavior of the algorithms very easily. Algorithms are also implemented in an iterative way, avoiding recursion. This has many benefits: full control over the search, access to the internals at runtime or a better and clear scale-out for large search spaces using the heap memory.

You can use Hipster to solve from simple graph search problems to more advanced state-space search problems where the state space is complex and weights are not just double values but custom defined costs.

I can’t help but hear “multiobjective search” in the the context of a document search where documents may or may not match multiple terms in a search request.

But that hearing is wrong because a graph can be more granular than a document and possess multiple ways to satisfy a particular objective. My intuition is that documents satisfy search requests only in a binary sense, yes or not. Yes?

Good way to get involved with Tinkerpop Blueprints.

Gremlin and Visualization with Gephi [Death of Import/Export?]

Wednesday, June 25th, 2014

Gremlin and Visualization with Gephi by Stephen Mallette.

From the post:

We are often asked how to go about graph visualization in TinkerPop. We typically refer folks to Gephi or Cytoscape as the standard desktop data visualization tools. The process of using those tools involves: getting your graph instance, saving it to GraphML (or the like) then importing it to those tools

TinkerPop3 now does two things to help make that process easier:

  1. A while back we introduced the “subgraph” step which allows you to pop-off a Graph instance from a Traversal, which help greatly simplify the typical graph visualization process with Gremlin, where you are trying to get a much smaller piece of your large graph to focus the visualization effort.
  2. Today we introduce a new :remote command in the Console. Recall that :remote is used to configure a different context where Gremlin will be evaluated (e.g. Gremlin Server). For visualization, that remote is called “gephi” and it configures the :submit command to take any Graph instance and push it through to the Gephi Streaming API. No more having to import/export files!

This rocks!

How do you imagine processing your data when import/export goes away?

Of course, this doesn’t have anything on *nix pipes but it is nice to see good ideas come back around.

Bigdata and Blueprints

Tuesday, May 27th, 2014

Bigdata and Blueprints

From the webpage:

Blueprints is an open-source property graph model interface useful for writing applications on top of a graph database. Gremlin is a domain specific language for traversing property graphs that comes with an excellent REPL useful for interacting with a Blueprints database. Rexster exposes a Blueprints database as a web service and comes with a web-based workbench application called DogHouse.

To get started with bigdata via Blueprints, Gremlin, and Rexster, start by getting your bigdata server running per the instructions here.

Then, go and download some sample GraphML data. The Tinkerpop Property Graph is a good starting point.

Just in case you aren’t familiar with bigdata(R):

bigdata(R) is a scale-out storage and computing fabric supporting optional transactions, very high concurrency, and very high aggregate IO rates. The bigdata RDF/graph database can load 1B edges in under one hour on a 15 node cluster. Bigdata operates in both a single machine mode (Journal), highly available replication cluster mode (HAJournalServer), and a horizontally sharded cluster mode (BigdataFederation). The Journal provides fast scalable ACID indexed storage for very large data sets, up to 50 billion edges. The HAJournalServer adds replication, online backup, horizontal scaling of query, and high availability. The federation provides fast scalable shard-wise parallel indexed storage using dynamic sharding and shard-wise ACID updates and incremental cluster size growth. Both platforms support fully concurrent readers with snapshot isolation. (

So, this is a major event for Blueprints.

I first saw this in a tweet by Marko A. Rodriguez.

Using AWS to Build a Graph-based…

Friday, November 22nd, 2013

Using AWS to Build a Graph-based Product Recommendation System by Andre Fatala and Renato Pedigoni.

From the description:

Magazine Luiza, one of the largest retail chains in Brazil, developed an in-house product recommendation system, built on top of a large knowledge Graph. AWS resources like Amazon EC2, Amazon SQS, Amazon ElastiCache and others made it possible for them to scale from a very small dataset to a huge Cassandra cluster. By improving their big data processing algorithms on their in-house solution built on AWS, they improved their conversion rates on revenue by more than 25 percent compared to market solutions they had used in the past.

Not a lot of technical details but a good success story to repeat if you are pushing graph-based services.

I first saw this in a tweet by Marko A. Rodriguez.

3 Myths about graph query languages…

Tuesday, October 15th, 2013

3 Myths about graph query languages. Busted by Pixy. by Sridhar Ramachandran.

A very short slide deck that leaves you wanting more information.

It got my attention because I didn’t know there were any myths about graph query languages. 😉

I think the sense of “myth” here is more “misunderstanding” or simply “incorrect information.”

Some references that may be helpful while reading these slides:

I must confess that “Myth #3: GQLs [Graph Query Languages] can’t be relational” has always puzzled me.

In part because hypergraphs have been used to model databases for quite some time.

For example:

Making use of arguments from information theory it is shown that a boolean function can represent multivalued dependencies. A method is described by which a hypergraph can be constructed to represent dependencies in a relation. A new normal form called generalized Boyce-Codd normal form is defined. An explicit formula is derived for representing dependencies that would remain in a projection of a relation. A definition of join is given which makes the derivation of many theoretical results easy. Another definition given is that of information in a relation. The information gets conserved whenever lossless decompositions are involved. It is shown that the use of null elements is important in handling data.

Would you believe: Some analytic tools for the design of relational database systems by K. K. Nambiar in 1980?

So far as I know, hypergraphs are a form of graph so it isn’t true that “graphs can only express binary relations/predicates.”

One difference (there are others) is that a hypergraph database doesn’t require derivation of relationships because those relationships are already captured by a hyperedge.

Moreover, a vertex can (whether it “may” or not in a particular hypergraph is another issue) be a member of more than one hyperedge.

Determination of common members becomes a straight forward query as opposed to two or more derivations of associations and then calculation of any intersection.

For all of that, it remains important as notice of a new declarative graph query language (GQL).

TinkerPop 2.4.0 Released (Gremlin Without a Cause)

Friday, August 9th, 2013

TinkerPop 2.4.0 Released (Gremlin Without a Cause) by Marko A. Rodriguez.

From the post:

TinkerPop 2.4.0 has been released under the name “Gremlin without a Cause” (see attached logo). The last release was back in March of 2013, so there are lots of new features/bugfixes/optimizations in the latest 2.4.0 release. Here is the best-of-the-best of each project along with the full release notes.

NOTE: 2.4.0 jars have been deployed to Apache Central Repo and ready for inclusion.

Another offering for your summer holiday enjoyment!


Wednesday, December 26th, 2012

Titan-Android by David Wu.

From the webpage:

Titan-Android is a port/fork of Titan for the Android platform. It is meant to be a light-weight implementation of a graph database on mobile devices. The port removes HBase and Cassandra support as their usage make little sense on a mobile device (convince me otherwise!). Gremlin is only supported via the Java interface as I have not been able to port groovy successfully. Nevertheless, Titan-Android supports local storage backend via BerkeleyDB and supports the Tinkerpop stack natively.

Just in case there was an Android under the tree!

I first saw this in a tweet by Marko A. Rodriguez.