Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 8, 2013

Permission Resolution with Neo4j — Part 2

Filed under: Graphs,Neo4j — Patrick Durusau @ 3:07 pm

Permission Resolution with Neo4j — Part 2 by Max De Marzi.

From the post:

Let’s try tackling something a little bigger. In Part 1 we created a small graph to test our permission resolution graph algorithm and it worked like a charm on our dozen or so nodes and edges. I don’t have fast hands, so instead of typing out a million node graph, we’ll build a graph generator and use the batch importer to load it into Neo4j. What I want to create is a set of files to feed to the batch-importer.

Nice walk through on generating graphs and importing them into Neo4j.

Curious, have you encountered a real world graph with three (3) relationship types?

How do you think a higher number of relationships types would impact performance?

April 6, 2013

Graphs for Gaming [Neo4j]

Filed under: Games,Graphs,Neo4j — Patrick Durusau @ 4:52 pm

Graphs for Gaming by Toby O’Rourke and Rik van Bruggen.

From the description:

Graph Databases have many use cases in many industries, but one of the most interesting ones that are emerging is in the Gaming industry. Because of its real-time nature, games are a perfect environment to make use of graph-based queries that are the basis for in-game recommendations. These recommendations make games more interesting for the users (they get to play cooler games with other people in their area, of their level, sharing their social network profile, etc) but also more profitable for the game providers, developers and publishers. After all: the latter want to be recommending specific games to specific target audiences, and thereby maximising their potential revenues.

Just in case tonight is movie night at your house and you forgot to pick up any videos. 😉

Or not.

Review comments:

Rik van Bruggen covers two centuries of math (Euler as the inventor of graphs), skips to Neo4j, then to NoSQL, criticisms of relational databases, new definition of complexity, and examples of complexity. Hits games at time mark 12:30, but discusses them very vaguely. Graphs in gaming, harnessing social networks. A demo of finding the games two people have played.

Nice demo of the Neo4j console.

Works for same company as the basis for a recommendation to play against Rik? Remember the perils of K-Nearest Neighbors: dangerously simple.

Query response in milliseconds? Think about the company size for the query.

Demonstrates querying but nothing to do with using graphs in gaming. (Mining networks of users, yes, but that’s a generic problem.)

At time mark 28:00, the Neo4j infomercial finally ends.

Toby O’Rourke takes over. Bingo business case was to obtain referrals of friends. Social network problem. General comments on future user of graphs for recommendations and fraud/collusion detection. (Yes, I know, friend referrals and recommendations sound a lot alike. Not to the presenter.)

There are informative and useful Neo4j videos so don’t judge them all by this one.

However, spend your forty-eight plus minutes somewhere other than on this video.

April 4, 2013

ETL into Neo4j

Filed under: Graphs,Neo4j — Patrick Durusau @ 5:37 am

ETL into Neo4j by Max De Marzi.

Max covers four different methods to load data into Neo4j.

Definitely worth a stop.

March 30, 2013

Permission Resolution with Neo4j – Part 1

Filed under: Cybersecurity,Graphs,Neo4j,Networks,Security — Patrick Durusau @ 2:17 pm

Permission Resolution with Neo4j – Part 1 by Max De Marzi.

From the post:

People produce a lot of content. Messages, text files, spreadsheets, presentations, reports, financials, etc, the list goes on. Usually organizations want to have a repository of all this content centralized somewhere (just in case a laptop breaks, gets lost or stolen for example). This leads to some kind of grouping and permission structure. You don’t want employees seeing each other’s HR records, unless they work for HR, same for Payroll, or unreleased quarterly numbers, etc. As this data grows it no longer becomes easy to simply navigate and a search engine is required to make sense of it all.

But what if your search engine returns 1000 results for a query and the user doing the search is supposed to only have access to see 4 things? How do you handle this? Check the user permissions on each file realtime? Slow. Pre-calculate all document permissions for a user on login? Slow and what if new documents are created or permissions change between logins? Does the system scale at 1M documents, 10M documents, 100M documents?

Search is one example of a need to restrict viewing results but browsing raises the same issues. Or display of information along side other information.

As I recall, Netware 4.1 (other versions as well no doubt) had the capability for a sysadmin to create sub-sysadmins, say for accounting or HR, that could hide information from the sysadmin. That was prior to search being commonly available.

What other security for search result schemes are out there?

March 29, 2013

How NoSQL Paid Off for Telenor

Filed under: Lucene,Marketing,Neo4j,Solr — Patrick Durusau @ 4:07 am

How NoSQL Paid Off for Telenor by Sebastian Verheughe and Katrina Sponheim.

A presentation I encountered while searching for something else.

Makes a business case for Lucene/Solr and Neo4j solutions to improve customer access to data.

As opposed to the world being a better place case.

What information process/need have you encountered where you can make a business case for topic maps?

March 24, 2013

Cypher in Neo4j 2.0

Filed under: Cypher,Neo4j — Patrick Durusau @ 3:28 pm

Cypher in Neo4j 2.0

Previews new features in Neo4j.

Labels & Indexing

Labels group nodes into sets. Nodes can have multiple labels.

Can use labels to create indexes on subsets of nodes.

Labels will support schema constraints (future feature).

I first saw this in a tweet by Michael Lappe.

March 22, 2013

How Sharehoods Created Neomodel Along The Way [London]

Filed under: Django,Graphs,Neo4j,Python — Patrick Durusau @ 12:53 pm

How Sharehoods Created Neomodel Along The Way

EVENT DETAILS

What: Neo4J User Group:CASE STUDY: How Sharehoods Created Neomodel Along The Way
Where: The Skills Matter eXchange, London
When: 27 Mar 2013 Starts at 18:30

From the description:

Sharehoods is a global online portal for foreigners. and the first place where new-comers to a city can build their social relationships and network – online or from a mobile phone.

In this talk, Sharehoods Head of Technology Robin Edwards will explain why and how Neo4j is used at this exciting tech startup. Robin will also give a whirlwind tour of neomodel, a new Python framework for neo4j and its integration with the Django stack.

Join this talk if you’d like to learn how to get productive with Neo4j, Python and Django.

Entity disambiguation:

I don’t think they mean:

Jamie Foxx

I think they mean:

django software The Web framework for perfectionists with deadlines.

If you attend, drop me a note to confirm my suspicions. 😉

March 20, 2013

Neo4j.org 3.0 Launch

Filed under: Graphs,Neo4j — Patrick Durusau @ 4:43 pm

Neo4j.org 3.0 Launch

From the post:

One major goal is to make it easier for you to get up and running with Neo4j. We hope to achieve this by providing everything in one place, from the download, set-up screencasts and step by step instructions to the rich choice of language support and the appropriate drivers.

We also want to make it easier for people that never worked with graph databases before to learn about Neo4j. So we created the infrastructure and started to work on learning paths that will tell a consistent story around a use-case or technology involving Neo4j. Currently we feature a learning path for Java and for Cypher but there will be many more to come. Any input in how to structure the paths and present the material is highly welcome!

Not that I am a good judge of website design but I like it a lot better than the previous version.

That maybe an artifact of liking graphs and so finding the presentation more intuitive.

There may be some confusion over “Learn” versus “Training” but finding better terms to distinguish self-learning versus instruction would not be easy.

If you really miss lists of links, those are at the bottom of the page.

I would mark this one as a win for the Neo4j team!

March 18, 2013

Permission Resolution With Neo4j – Part 1

Filed under: Graphs,Neo4j,Networks,Security — Patrick Durusau @ 4:32 pm

Permission Resolution With Neo4j – Part 1 by Max De Marzi.

From the post:

People produce a lot of content. Messages, text files, spreadsheets, presentations, reports, financials, etc, the list goes on. Usually organizations want to have a repository of all this content centralized somewhere (just in case a laptop breaks, gets lost or stolen for example). This leads to some kind of grouping and permission structure. You don’t want employees seeing each other’s HR records, unless they work for HR, same for Payroll, or unreleased quarterly numbers, etc. As this data grows it no longer becomes easy to simply navigate and a search engine is required to make sense of it all.

But what if your search engine returns 1000 results for a query and the user doing the search is supposed to only have access to see 4 things? How do you handle this? Check the user permissions on each file realtime? Slow. Pre-calculate all document permissions for a user on login? Slow and what if new documents are created or permissions change between logins? Does the system scale at 1M documents, 10M documents, 100M documents?

Max addresses the scaling issue by checking only the results from a search. So to that extent, the side of the document store becomes irrelevant.

At least if you have a smallish number of results from the search.

I haven’t seen part 2 but another scale tactic would be to limit access to indexes by permissions. Segregating human resources, accounting, etc.

Looking forward to where Max takes this one.

March 17, 2013

Matching Traversal Patterns with MATCH

Filed under: Cypher,Graphs,Neo4j — Patrick Durusau @ 5:17 am

Cypher basics: Matching Traversal Patterns with MATCH by Wes Freeman.

From the post:

“Because friends don’t let friends write atrocious recursive joins in SQL.” –Max De Marzi

The match clause is one of the first things you learn with Cypher. Once you’ve figured out how to look up your starting bound identifiers with start, you usually (but not always) want to match a traversal pattern, which is one of Cypher’s most compelling features.

The goal of this post is not to go over the syntax for all of the different cases in match–for that the docs do a good job: Cypher MATCH docs. Rather, I hoped to explain more the how of how match works.

First, you need to understand the difference between bound and unbound identifiers (sometimes we call them variables, too, in case I slip up and forget to be consistent). Bound identifiers are the ones that you know the value(s) of–usually you set these in the start clause, but sometimes they’re passed through with with. Unbound identifiers are the ones you don’t know the values of: the part of the pattern you’re matching. If you don’t specify an identifier, and instead just do a-->(), or something of that sort, an implicit unbound identifier is created for you behind the scenes, so Cypher can keep track of the values it’s found. The goal of the match clause is to find real nodes and relationships that match the pattern specified (find the unbound identifiers), based on the bound identifiers you have from the start.

Wes is creating enough of these mini-tutorials that you will find his Cypher page, a welcome collection point.

March 16, 2013

Cypher basics: it all starts with the START

Filed under: Cypher,Graphs,Neo4j — Patrick Durusau @ 2:27 pm

Cypher basics: it all starts with the START by Wes Freeman.

From the post:

“It all starts with the START” -Michael Hunger, Cypher webinar, Sep 2012

The start clause is one of those things that seems quite simple initially. You specify your start point(s) for the rest of the query. Typically, you use an index lookup, or if you’re just messing around, a node id (or list of node ids). This sets the stage for you to match a traversal pattern, or just filter your nodes with a where. Let’s start with a simple example–here we’re going to find a single node, and return it (later we’ll get into why start is sort of like a SQL from):

Wes continues his excellent introduction to Cypher.

March 15, 2013

A Peek Behind the Neo4j Lucene Index Curtain

Filed under: Indexing,Lucene,Neo4j — Patrick Durusau @ 4:02 pm

A Peek Behind the Neo4j Lucene Index Curtain by Max De Marzi.

Max suggests using a copy of your Neo4j database for this exercise.

Could be worth your while to go exploring.

And you will learn something about Lucene in the bargain.

Cypher for SQL Professionals

Filed under: Cypher,Graphs,Neo4j — Patrick Durusau @ 3:48 pm

Cypher for SQL Professionals by Michael Hunger. (video)

From the webpage:

Cypher is a graph query language used in Neo4j. Much like SQL, it’s a declarative language used for querying databases.

What does a join look like in Cypher? What about an left outer join? There are a lot of similarities between the two, Cypher is heavily influenced by SQL. We’ll talk about what these common concepts and differences are, what Cypher gives you that SQL leaves you wanting. Come along and see how Neo4j and Cypher can make your daily grind much easier and more fun.

You’ll learn how to use your current SQL knowledge to quickly get started using Neo4j and Cypher.

Just in case you get to pick the video for this weekend. 😉

This is work friendly so you might better save it for a lunch break next week.

March 12, 2013

The Mythical WITH (Neo4j’s Cypher query language)

Filed under: Cypher,Neo4j — Patrick Durusau @ 1:17 pm

The Mythical WITH (Neo4j’s Cypher query language) by Wes Freeman.

From the post:

Coming from SQL, I found Cypher a quick learn. The match was new, and patterns were new, but everything else seemed to fit well with SQL concepts. Except with, the way to build a sort of sub-query–it seemed hard to wrap my head around. So, what really happens behind the scenes with a with clause in your query? How does it work? It turns out, almost any complex query ends up needing a with in it, but let’s start with a simple example.

After reading this post, you will be waiting for part 2!

Very good introduction to with in Cypher!

BTW, as an added bonus, Wes blogs about chess as well.

Sinking Data to Neo4j from Hadoop with Cascading

Filed under: Cascading,Hadoop,Neo4j — Patrick Durusau @ 10:18 am

Sinking Data to Neo4j from Hadoop with Cascading by Paul Ingles.

From the post:

Recently, I worked with a colleague (Paul Lam, aka @Quantisan on building a connector library to let Cascading interoperate with Neo4j: cascading.neo4j. Paul had been experimenting with Neo4j and Cypher to explore our data through graphs and we wanted an easy way to flow our existing data on Hadoop into Neo4j.

The data processing pipeline we’ve been growing at uSwitch.com is built around Cascalog, Hive, Hadoop and Kafka.

Once the data has been aggregated and stored a lot of our ETL is performed upon Cascalog and, by extension, Cascading. Querying/analysis is a mix of Cascalog and Hive. This layer is built upon our long-term data storage system: Hadoop; this, all combined, lets us store high-resolution data immutably at a much lower cost than uSwitch’s previous platform.

As Paul notes later in his post, this isn’t a fast solution, about 20,000 nodes a second.

But if that fits your requirements, could be a good place to start.

March 11, 2013

Reco4j

Filed under: Graphs,Neo4j,Recommendation — Patrick Durusau @ 7:26 pm

Reco4j

Reco4j is an open source project that aims at developing a recommendation framework based on graph data sources. We choose graph databases for several reasons. They are NoSQL databases that are “schemaless”. This means that it is possible to extend the basic data structure with intermediate information, i.e. similarity value between item and so on. Moreover, since every information are expressed with some properties, nodes and relations, the recommendation process can be customized to work on every graph.
Indeed Reco4j can be used on every graph where “user” and “item” are represented by nodes and the preferences are modelled as relationship between them.

The current implementation leverages on Neo4j as first example of graph database integrated in our framework.

The main features of Reco4j are:

  1. Performance, leveraging on the graph database and storing information in it for future retrieving it produce fast recommendations also after a system restart;
  2. Use of Network structure, integrating the simple recommendation algorithms with (social) network analisys;
  3. General purpose, it can be used with preexisting databases;
  4. Customizability, editing the properties file the recommender framework can be adapted to the current graph structure and use several types of the recommendation algorithms;
  5. Ready for Cloud, leveraging on the graph database cloud features the recommendation process can be splitted on several nodes.

Just in case you don’t like the recommendations you get from Amazon. 😉

BTW, “splitted” is an archaic past tense form of split. (According to Merriam-Webster.)

Say rather “…the recommendation process can be split onto several nodes.”

March 8, 2013

Model Matters: Graphs, Neo4j and the Future

Filed under: Graphs,Modeling,Neo4j — Patrick Durusau @ 2:58 pm

Model Matters: Graphs, Neo4j and the Future by Tareq Abedrabbo.

From the post:

As part of our work, we often help our customers choose the right datastore for a project. There are usually a number of considerations involved in that process, such as performance, scalability, the expected size of the data set, and the suitability of the data model to the problem at hand.

This blog post is about my experience with graph database technologies, specifically Neo4j. I would like to share some thoughts on when Neo4j is a good fit but also what challenges Neo4j faces now and in the near future.

I would like to focus on the data model in this blog post, which for me is the crux of the matter. Why? Simply because if you don’t choose the appropriate data model, there are things you won’t be able to do efficiently and other things you won’t be able to do at all. Ultimately, all the considerations I mentioned earlier influence each other and it boils down to finding the most acceptable trade-off rather than picking a database technology for one specific feature one might fancy.

So when is a graph model suitable? In a nutshell when the domain consists of semi-structured, highly connected data. That being said, it is important to understand that semi-structured doesn’t imply an absence of structure; there needs to be some order in your data to make any domain model purposeful. What it actually means is that the database doesn’t enforce a schema explicitly at any given point in time. This makes it possible for entities of different types to cohabit – usually in different dimensions – in the same graph without the need to make them all fit into a single rigid structure. It also means that the domain can evolve and be enriched over time when new requirements are discovered, mostly with no fear of breaking the existing structure.

Effectively, you can start taking a more fluid view of your domain as a number of superimposed layers or dimensions, each one representing a slice of the domain, and each layer can potentially be connected to nodes in other layers.

More importantly, the graph becomes the single place where the full domain representation can be consolidated in a meaningful and coherent way. This is something I have experienced on several projects, because modeling for the graph gives developers the opportunity to think about the domain in a natural and holistic way. The alternative is often a data-centric approach, that usually results from integrating different data flows together into a rigidly structured form which is convenient for databases but not for the domain itself.

Interesting review of the current and some projected capabilities of Neo4j.

I am particularly sympathetic to starting with the data users have as opposed to starting with a model written in software and shoe horning the user’s data to fit the model.

Can be done, has been done (for decades), and works quite well in some cases.

But not all cases.

neo4j: Make properties relationships [Associations As First Class Citizens?]

Filed under: Graphs,Neo4j,Networks — Patrick Durusau @ 2:43 pm

neo4j: Make properties relationships by Mark Needham.

From the post:

I spent some of the weekend working my way through Jim, Ian & Emil‘s book ‘Graph Databases‘ and one of the things that they emphasise is that graphs allow us to make relationships first class citizens in our model.

Looking back on a couple of the graphs that I modelled last year I realise that I didn’t quite get this and although the graphs I modelled had some relationships a lot of the time I was defining things as properties on nodes.

While it’s fine to do this I think we lose some of the power of a graph and it’s not necessarily obvious what we’ve lost until we model a property as a relationship and see what possibilities open up.

For example in my football graph I wanted to record the date of matches and initially stored this as a property on the match before realising that modelling it as a relationship which might open up some interesting queries.

Reading Mark’s post illustrates the power of using associations to model “properties” in topic maps.

In Neo4j, relationships are first class citizens.

Unfortunately, we can’t say the same for associations in topic maps.

You may recall that associations in a topic map are restricted in the information they can carry.

If you want to add a name to an association, for example, you have to reify the association with a topic. Which means you have the association and a topic for the association, representing the same subject.

Not to mention a lot of machinery overhead for something fairly simple.

I am aware that the TMDM and XTM were fashioned to follow the original version of ISO 13250. The origin of reification in topic maps.

However, simply because all buggies had whips at one point is no reason to design cars with whip holders.

The time has come to revisit reification and in my view, revise both the TMDM and XTM to remove it.

And to make associations and occurrences first class citizens in both the TMDM and XTM.

Comments/suggestions?

March 7, 2013

Neo4j 1.9.M05 released – wrapping up

Filed under: Cypher,Graphs,Neo4j — Patrick Durusau @ 1:30 pm

Neo4j 1.9.M05 released – wrapping up by Peter Neubauer.

From the post:

We are very proud to announce the next milestone of the Neo4j 1.9 release cycle. This time, we have been trying to introduce as few big changes as possible and instead concentrate on things that make the production environment a more pleasant experience. That means Monitoring, Cypher profiling, Java7 and High Availability were the targets for this work.

Everyone likes improvements, new features, etc.

I am leaning towards profiling cypher statements as my favorite in this release.

What’s yours?

March 6, 2013

Importing data into Neo4j – the spreadsheet way

Filed under: Graphs,Neo4j — Patrick Durusau @ 7:23 pm

Importing data into Neo4j – the spreadsheet way by Rik Van Bruggen.

From the post:

I am sure that many of you are very technical people, very knowledgeable about all things Java, Dr. Who and many other things – but I in case you have ever met me, you would probably have noticed that I am not. And I don’t want to be. I love technology, but have never had the talent, inclination or education to program – so I don’t. But I still want to get data into Neo4j – so how do I do that?

There are many technical tools out there (definitely look here, here and here, but I needed something simple. So my friend and colleague Michael Hunger came to the rescue, and offered some help to create a spreadsheet to import into Neo4j.

You will find the spreadsheet here, and you will find two components:

  1. an instruction sheet. I will get to that later.
  2. a data import sheet. Let’s look at that first.

Getting Neo4j closer to the average business user.

Are spreadsheets becoming (are?) the bridge between “unstructured” data and more sophisticated data structures?

Thinking of tools like Data Explorer for example.

If so, focusing subject identity/mapping tools on spreadsheet tables might be a good move.

March 5, 2013

“Do Bees” / “Don’t Bees” and Neo4j

Filed under: Cypher,Graphs,Neo4j,Networks — Patrick Durusau @ 1:12 pm

According to Michael Hunger in a Neo4j Google Groups message, the Neo4j team is drowning in its own success!

Now there’s a problem to have!

“Do Bees” for Neo4j will:

…ask questions on Stack Overflow that related to:

Please tag your questions with “neo4j” and “cypher”, “gremlin” or “spring data neo4j” accordingly. See the current list:

http://stackoverflow.com/questions/tagged/neo4j

Currently questions on SO are answered quickly by a group of very active people which we hope you will join. We try to chime in as often as possible (especially with unanswered questions).

So PLEASE post your questions there on Stack Overflow, we will start asking individuals to move their questions to that platform and if they don’t manage it, move them ourselves.

We will also monitor this badge: http://stackoverflow.com/badges/1785/neo4j and award cool stuff for people that make it there.

This google group shall return to its initial goals of having broader discussions about graph topics, modeling, architectures, roadmap, announcements, cypher evolution, open source etc. So we would love everyone who has questions or problems in these areas to reach out and start a conversation.

Hope for your understanding to make more breathing room in this group and more interesting discussions in the future while keeping an interactive FAQ around Neo4j going on SO with quick feedback loops and turnaround times.

The Neo4j community will be healthier if we are all “Do Bees” so I won’t cover the alternative.

If you don’t know “Do Bees” / “Don’t Bees,” see: Romper Room.

See you at Stackoverflow!

March 4, 2013

A New Representation of WordNet® using Graph Databases

Filed under: Graph Databases,Graphs,Neo4j,Networks,WordNet — Patrick Durusau @ 10:46 am

A New Representation of WordNet® using Graph Databases by Khaled Nagi.

Abstract:

WordNet® is one of the most important resources in computation linguistics. The semantically related database of English terms is widely used in text analysis and retrieval domains, which constitute typical features, employed by social networks and other modern Web 2.0 applications. Under the hood, WordNet® can be seen as a sort of read-only social network relating its language terms. In our work, we implement a new storage technique for WordNet® based on graph databases. Graph databases are a major pillar of the NoSQL movement with lots of emerging products, such as Neo4j. In this paper, we present two Neo4j graph storage representations for the WordNet® dictionary. We analyze their performance and compare them to other traditional storage models. With this contribution, we also validate the applicability of modern graph databases in new areas beside the typical large-scale social networks with several hundreds of millions of nodes.

Finally, a paper that covers “moderate size databases!”

Think about the average graph database you see on this blog. Not really in the “moderate” range, even though a majority of users work in the moderate range.

Compare the number of Facebook size enterprises with the number of enterprises generally.

Not dissing super-sized graph databases or research on same. I enjoy both a lot.

But for your average customer, experience with “moderate size databases” may be more immediately relevant.

I first saw this in a tweet from Peter Neubauer.

February 26, 2013

neo4j/cypher: Combining COUNT and COLLECT in one query

Filed under: Cypher,Neo4j — Patrick Durusau @ 1:52 pm

neo4j/cypher: Combining COUNT and COLLECT in one query by Mark Needham.

From the post:

In my continued playing around with football data I wanted to write a cypher query against neo4j which would show me which teams had missed the most penalties this season and who missed them.

Mark discovers queries with two aggregation expressions have problems but goes on to solve it as well.

February 21, 2013

Neo4j: A Developer’s Perspective

Filed under: Graphs,Neo4j — Patrick Durusau @ 8:47 pm

Neo4j: A Developer’s Perspective

From the post:

In the age of NoSQL databases, where a new database seems to pop up every week, it is not surprising that even a larger number of articles related to them are written everyday. So when I started writing this blog on Neo4j, instead of describing how freaking awesome it is, I aimed to address the most common issues that a “regular” developer faces. By regular, I mean that, a developer, who is familiar with databases in general and knows the basics for Neo4j, but is a novice when it comes to actually using it.

A brief overview for those not familiar with Neo4j. Neo4j is a graph database. A graph database uses the concept of graph theory to store data. Graph Theory is the study of graphs, which are structures containing vertices and edges or in other words nodes and relationships. So, in a graph database, data is modeled in terms of nodes and relationships. Neo4j, at a first glance seems pretty much similar to any other graph database model that we encountered before. It has nodes, it has relationships, they are interconnected to form a complex graph and you traverse the graph in a specific pattern to get desired results.

I don’t think you will see anything new here but it is a useful post if you are unfamiliar with Neo4j.

I mention it primarily because of a comment objecting to the AGPL licensing of Neo4j.

Err, if I am writing a web application to sell to a client, why would I object to paying for a commercial license for Neo4j?

Or is there some subtlety to profiting off of free software that I am missing?

I first saw this at: DZone.

February 14, 2013

Neo4j and Gatling Sitting in a Tree, Performance T-E-S-T-ING

Filed under: Gatling,Neo4j,Scala — Patrick Durusau @ 7:16 pm

Neo4j and Gatling Sitting in a Tree, Performance T-E-S-T-ING by Max De Marzi.

From the post:

I was introduced to the open-source performance testing tool Gatling a few months ago by Dustin Barnes and fell in love with it. It has an easy to use DSL, and even though I don’t know a lick of Scala, I was able to figure out how to use it. It creates pretty awesome graphics and takes care of a lot of work for you behind the scenes. They have great documentation and a pretty active google group where newbies and questions are welcomed.

It requires you to have Scala installed, but once you do all you need to do is create your tests and use a command line to execute it. I’ll show you how to do a few basic things, like test that you have everything working, then we’ll create nodes and relationships, and then query those nodes.

You did run performance tests on your semantic application. Yes?

February 10, 2013

How Neo4j beat Oracle Database

Filed under: Graphs,Neo4j,Networks,Oracle — Patrick Durusau @ 11:56 am

Neo Technology execs: How Neo4j beat Oracle Database by Paul Krill.

From the post:

Neo Technology, which was formed in 2007, offers Neo4J, a Java-based open source NoSQL graph database. With a graph database, which can search social network data, connections between data are explored. Neo4j can solve problems that require repeated network probing (the database is filled with nodes, which are then linked), and the company stresses Neo4j’s high performance. InfoWorld Editor at Large Paul Krill recently talked with Neo CEO Emil Eifrem and Philip Rathle, Neo senior director of products, about the importance of graph database technology as well as Neoo4j’s potential in the mobile space. Eifrem also stressed his confidence in Java, despite recent security issues affecting the platform.

InfoWorld: Graph database technology is not the same as NoSQL, is it?

Eifrem: NoSQL is actually four different types of databases: There’s key value stores, like Amazon DynamoDB, for example. There’s column-family stores like Cassandra. There’s document databases like MongoDB. And then there’s graph databases like Neo4j. There are actually four pillars of NoSQL, and graph databases is one of them. Cisco is building a master data management system based on Neo4j, and this is actually our first Fortune 500 customer. They found us about two years ago when they tried to build this big, complex hierarchy inside of Oracle RAC. In Oracle RAC, they had response time in minutes, and then when they replaced it [with] Neo4j, they had response times in milliseconds. (emphasis added)

It is a great story and one I would repeat if I were marketing Neo4j (which I like a lot).

However, there are a couple of bits missing from the story that would make it more informative.

Such as what “…big, complex hierarchy…” was Cisco trying to build? Details please.

There are things that relational databases don’t do well.

Not realizing that up front is a design failure, not one of software or of relational databases.

Another question I would ask: What percentage of Cisco databases are relational vs. graph?

Fewer claims/stories and more data would go a long way towards informed IT decision making.

February 3, 2013

[Neo4j] FOSDEM 2013 summary

Filed under: Conferences,Graphs,Neo4j — Patrick Durusau @ 6:59 pm

FOSDEM 2013 summary by Peter Neubauer.

Peter mentions the following Neo4j related projects:

See his post for other details.

February 2, 2013

Neo4j – Social Networking – QA – Scientific Communication

Filed under: Graphs,Neo4j,Social Networks — Patrick Durusau @ 3:10 pm

René Pickhardt’s blog post title was: Slides of Related work application presented in the Graphdevroom at FOSDEM, which is unlikely to catch your eye. The paper title is: A neo4j powered social networking and Question & Answer application to enhance scientific communication.

I took the liberty of crafting a shorter title for this post. 😉

The problems René addresses are shared by all academics:

  1. Finding new relevant publications
  2. Connecting people interested in the same topic

This project is the result of the merger of the Open Citation and Related Work project, on which see: Open Citations and Related Work projects merge.

The terminology for the project components:

  • Open Citations Corpus: data corpus
  • Open Citations Corpus Datastore (OCCD): infrastructure of the data corpus
  • Related Work: user-oriented services built on top of the citation data

Resources:

You need to take a long look at the project in general but the data in particular.

From the data webpage:

We downloaded the source files of all arxiv articles published until 2012-09-31, extracted the references and matched them against the metadata using these python scripts. The result is a 2.0Gb sized *.txt file with more than 16m lines representing the citaiton graph in the following format:

Document level linking so there is still topic map work to be done merging the same subjects identified differently but this data set is certainly a “leg up” on that task.

We should all encourage if not actively contribute to the Related Work project.

January 31, 2013

Demining the “Join Bomb” with graph queries

Filed under: Graphs,Neo4j,Networks — Patrick Durusau @ 7:26 pm

Demining the “Join Bomb” with graph queries by Rik Van Bruggen.

From the post:

For the past couple of months, and even more so since the beer post, people have been asking me a question that I have been struggling to answer myself for quite some time: what is so nice about the graphs? What can you do with a graph database that you could not, or only at great pains, do in a traditional relational database system. Conceptually, everyone understands that this is because of the inherent query power in a graph traversal – but how to make this tangible? How to show this to people in a real and straightforward way?

And then Facebook Graph Search came along, along with it’s many crazy search examples – and it sort of hit me: we need to illustrate this with *queries*. Queries that you would not – or only with a substantial amount of effort – be able to do in traditional database system – and that are trivial in a graph.

This is what I will be trying to do in this blog post, using an imaginary dataset that was inspired by the Telecommunications industry. You can download the dataset here, but really it is very simple: a number of “general” data elements (countries, languages, cities), a number of “customer” data elements (person, company) and a number of more telecom-related data elements (operators – I actually have the full list of all mobile operators in the countries in the dataset coming from here and here, phones and conference call service providers).

Great demonstration using simulated telecommunications data of the power of graph queries.

Highly recommended!

Facebook Graph Search with Cypher and Neo4j

Filed under: Cypher,Facebook,Graphs,Neo4j — Patrick Durusau @ 7:24 pm

Facebook Ggraph Search with Cypher and Neo4j by Max De Marzi.

From the post:

Facebook Graph Search has given the Graph Database community a simpler way to explain what it is we do and why it matters. I wanted to drive the point home by building a proof of concept of how you could do this with Neo4j. However, I don’t have six months or much experience with NLP (natural language processing). What I do have is Cypher. Cypher is Neo4j’s graph language and it makes it easy to express what we are looking for in the graph. I needed a way to take “natural language” and create Cypher from it. This was going to be a problem.

If you think about “likes” as an association type with role players….

Of course, “like” paints with a broad brush but it is a place to start.

« Newer PostsOlder Posts »

Powered by WordPress