Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

February 23, 2012

Neo4j Koans

Filed under: Neo4j — Patrick Durusau @ 4:52 pm

Neo4j Koans

From the webpage:

This set of Koans will provide practical guidance for getting to grips with graph data structures and operations using the Neo4j open source graph database. It’s part of a more comprehensive tutorial presented by the authors and others at conferences and tutorials. In fact anyone can take these materials freely and run their own tutorials.

What are Koans?

The Koan idea was borrowed from the Ruby Koans which provide a number of broken unit tests, and in fixing those tests increasingly advanced facets of Ruby are explored. The Koan model provides very rapid feedback and a structured learning path wrapped in a pre-configured environment that gets us up and running very quickly. These are very desirable characteristics when it comes to learning Neo4j, and so these Koans have adopted the same model – there are a set of (broken) unit tests, and in fixing each of them we learn some aspect of using Neo4j. As we work forwards through the Koans we’ll learn more sophisticated APIs, query languages and techniques and by the end of the Koans we’ll feel supremely confident about using Neo4j in production.

The next best thing to being at a tutorial with Neo4j authors!

Cloudera Manager | Log Management, Event Management and Alerting Demo Video

Filed under: Cloudera — Patrick Durusau @ 4:52 pm

Cloudera Manager | Log Management, Event Management and Alerting Demo Video by Jon Zuanich

From the post:

In this demo, Henry Robinson, a software engineer at Cloudera, discusses the Log Management, Event Management and Alerting features in Cloudera Manager that help make sense out of all the discrete events that take place across the Hadoop cluster. He demonstrates how to search the logs valuable information, note important events that pertain to system health and create alerts to warn you when things go wrong.

As a once upon a time sysadmin, I know making a system transparent for users is the result of a lot of unseen work.

I think I have less than 50 (fifty) nodes, ;-), so I will have to take this for a spin. I need to find some cheap commodity boxes so I can set up a 3 or 4 node test bed. I can try in on a single node system just to get my feet wet.

If you are using Cloudera Manager or try it out, point me to a blog with your comments.

Finding the lowest common ancestor of a set of NCBI taxonomy nodes with Bio4j

Filed under: Bio4j,Bioinformatics,Common Ancestor,Taxonomy — Patrick Durusau @ 4:51 pm

Finding the lowest common ancestor of a set of NCBI taxonomy nodes with Bio4j

Pablo Pareja writes:

I don’t know if you have ever heard of the lowest common ancestor problem in graph theory and computer science but it’s actually pretty simple. As its name says, it consists of finding the common ancestor for two different nodes which has the lowest level possible in the tree/graph.

Even though it is normally defined for only two nodes given it can easily be extended for a set of nodes with an arbitrary size. This is a quite common scenario that can be found across multiple fields and taxonomy is one of them.

The reason I’m talking about all this is because today I ran into the need to make use of such algorithm as part of some improvements in our metagenomics MG7 method. After doing some research looking for existing solutions, I came to the conclusion that I should implement my own, – I couldn’t find any applicable implementation that was thought for more than just two nodes.

Important for its use with NCBI taxonomy nodes but another use case comes readily to mind.

What about overlapping markup?

Traditionally we represent markup elements as single nodes, despite their composition of and events for each “well-formed” element in the text stream.

But what if we represent and events as nodes in a graph with relationships both to each other and other nodes in the markup stream?

Can we then ask the question, Which pair of / nodes are the ancestor of either a or element?

If they have the same ancestor then we have the uninteresting case of well-formed markup.

But what if they don’t have the same ancestor? What can the common ancestor method tell us about the structure of the markup?

Definitely a research topic.

Neo4j module for Puppet

Filed under: Neo4j — Patrick Durusau @ 4:51 pm

Neo4j module for Puppet

From the readme:

This module installs Neo4j as a standalone server.

If you want more information about Puppet, see: PuppetLabs.

The Scale of the Universe 2

Filed under: Interface Research/Design,Visualization — Patrick Durusau @ 4:51 pm

The Scale of the Universe 2

This is a very effective scale of the universe visualization!

You can click on objects to learn more.

What scale do you want to visualize? As an interface for your topic map?

February 22, 2012

Gremlin vs Cypher Initial Thoughts @Neo4j

Filed under: Cypher,Graphs,Gremlin,Neo4j,Neo4jClient — Patrick Durusau @ 4:49 pm

Gremlin vs Cypher Initial Thoughts @Neo4j

Romiko Derbynew writes:

The Neo4jClient now supports Cypher as a query language with Neo4j. However I noticed the following:

  • Simple graph traversals are much more efficient when using Gremlin
  • Queries in Gremlin are 30-50% faster for simple traversals
  • Cypher is ideal for complex traversals where back tracking is required
  • Cypher is our choice of query language for reporting
  • Gremlin is our choice of query language for simple traversals where projections are not required
  • Cypher has intrinsic table projection model, where Gremlins table projection model relies on AS steps which can be cumbersome when backtracking e.g. Back(), As() and _CopySplit, where cypher is just comma separated matches
  • Cypher is much better suited for outer joins than Gremlin, to achieve similar results in gremlin requires parallel querying with CopySplit.
  • Gremlin is ideal when you need to retrieve very simple data structures
  • Table projection in gremlin can be very powerful, however outer joins can be very verbose

So in a nutshell, we like to use Cypher when we need tabular data back from Neo4j and is especially useful in outer joins.

Excellent comparison of Gremlin vs. Cypher. Both have their advantages.

Max Flow with Gremlin and Transactions

Filed under: Graphs,Gremlin,Neo4j — Patrick Durusau @ 4:49 pm

Max Flow with Gremlin and Transactions

Max De Marzi writes:

The maximum flow problem was formulated by T.E. Harris as follows:

Consider a rail network connecting two cities by way of a number of intermediate cities, where each link of the network has a number assigned to it representing its capacity. Assuming a steady state condition, a nd a maximal flow from one given city to the other.

Back in the mid 1950s the US Military had an interest in finding out how much capacity the Soviet railway network had to move cargo from the Western Soviet Union to Eastern Europe. This lead to the Maximum Flow problem and the Ford–Fulkerson algorithm to solve it.

If you’ve been reading the Neo4j Gremlin Plugin documentation, you’ll remember it has a section on Flow algorithms with Gremlin. Let’s add a couple of things and bring this example to life.

If that sounds like an out-dated Cold War problem, consider Max’s conclusion:

The max flow and related problems manifest in many ways. Water or sewage through underground pipes, passengers on a subway system, data through a network (the internet is just a series of tubes!), roads and highway planning, airline routes, even determining which sports teams have been eliminated from the playoffs.

What else can be modeled as max flow or related problems? Drug/weapons smuggling? Oil/gas/electricity transport? Others?

How you’re connected to Kevin Bacon

Filed under: Graphs,Neo4j — Patrick Durusau @ 4:49 pm

How you’re connected to Kevin Bacon

Max De Marzi writes:

Previously I showed you how to get Neo4j up and running with Ruby and how to find recommended friends on a social network. What about finding out how you are connected to someone outside of your friends of friends network? Do you remember the concept of six degrees of separation? No, how about six degrees of Kevin Bacon?

Another good example of using Neo4j.

I would ask a different question of law enforcement and/or national security agencies:

Who is within 2 degrees of separation of any current crime/terror suspect? Are you sure you have all the connections?

Graphs + topic maps can’t produce data that isn’t there, but they can help you manage data you have effectively.

The Data Hub

Filed under: Data,Dataset — Patrick Durusau @ 4:48 pm

The Data Hub

From the about page:

What was the average price of a house in the UK in 1935? When will India’s projected population overtake that of China? Where can you see publicly-funded art in Seattle? Data to answer many, many questions like these is out there on the Internet somewhere – but it is not always easy to find.

the Data Hub is a community-run catalogue of useful sets of data on the Internet. You can collect links here to data from around the web for yourself and others to use, or search for data that others have collected. Depending on the type of data (and its conditions of use), the Data Hub may also be able to store a copy of the data or host it in a database, and provide some basic visualisation tools.

I covered the underlying software in CKAN – the Data Hub Software.

If your goal is to simply make data sets available with a minimal amount of metadata, this may be the software for you.

If your goal is to make data sets available with enough metadata to make robust use of them, you need to think again.

There is an impressive amount of data sets at this site.

But junk yards have an impressive number of wrecked cars.

Doesn’t help you find the car with the part you need. (Think data formats, semantics, etc.)

Look But Don’t Touch

Filed under: Data,Geographic Data,Government Data,Transparency — Patrick Durusau @ 4:48 pm

I would describe the Atlanta GIS Data Catalog as a Look But Don’t Touch system. A contrast to the efforts of DC at transparency.

From the webpage:

GIS Data Catalog

Atlanta GIS creates and maintains many GIS data sets (also known as” layers” because of the way they are layered one on top another to create a map) and collects others from external sources, mostly other government agencies. Each layer represents some class of geographic feature. The features represented can be physical, such as roads, buildings and streams, or they can be conceptual, such as neighbor boundaries, property lines and the locations of crimes.

The GIS Data Catalog is an on-line compilation of information on GIS layers used by the CIty. The catalog allows you to quickly locate GIS data by searching by keyword. You can also view metadata for each data layer in the catalog. All data in the catalog represent the best and most current GIS data maintained or used by the city. The city’s GIS metadata is maintained in conformance with a standard defined by the Federal Geographic Data Committee (FGDC) .

The data layers themselves are not available for download from the catalog. Data can be requested by contacting the originating department or agency. More specific contact information is available within the metadata for many data layers. (emphasis added)

I am sure most agencies would supply the data on request, but why require the request?

To add a request processing position to the agency payroll and to have procedures for processing requests, along with meetings on request granting, plus an appeals process if the request is rejected, with record keeping for all of the foregoing plus more?

That doesn’t sound like transparent government or effective use of tax dollars to me.

District of Columbia – Data Catalog

Filed under: Data,Government Data,Open Data,Transparency — Patrick Durusau @ 4:48 pm

District of Columbia – Data Catalog

This is an example of a city moving towards transparency.

A large number of data sets to access (485 as of today), with live feeds to some data streams.

Eurostat

Filed under: Data,Dataset,Government Data,Statistics — Patrick Durusau @ 4:48 pm

Eurostat

From the “about” page:

Eurostat’s mission: to be the leading provider of high quality statistics on Europe.

Eurostat is the statistical office of the European Union situated in Luxembourg. Its task is to provide the European Union with statistics at European level that enable comparisons between countries and regions.

This is a key task. Democratic societies do not function properly without a solid basis of reliable and objective statistics. On one hand, decision-makers at EU level, in Member States, in local government and in business need statistics to make those decisions. On the other hand, the public and media need statistics for an accurate picture of contemporary society and to evaluate the performance of politicians and others. Of course, national statistics are still important for national purposes in Member States whereas EU statistics are essential for decisions and evaluation at European level.

Statistics can answer many questions. Is society heading in the direction promised by politicians? Is unemployment up or down? Are there more CO2 emissions compared to ten years ago? How many women go to work? How is your country’s economy performing compared to other EU Member States?

International statistics are a way of getting to know your neighbours in Member States and countries outside the EU. They are an important, objective and down-to-earth way of measuring how we all live.

I have seen Eurostat mentioned, usually negatively, by data aggregation services. I visited Eurostat today and found it quite useful.

For the non-data professional, there are graphs and other visualizations of popular data.

For the data professional, there are bulk downloads of data and other technical information.

I am sure there is room for improvement specific feedback is required to make that happen. (It has been my experience that positive specific feedback works best. Fine something nice to say and then suggest a change to improve the outcome.)

The Open Data Handbook

Filed under: Open Data — Patrick Durusau @ 4:47 pm

The Open Data Handbook

From the website:

This handbook discusses the legal, social and technical aspects of open data. It can be used by anyone but is especially designed for those seeking to open up data. It discusses the why, what and how of open data – why to go open, what open is, and the how to ‘open’ data.

To get started, you may wish to look at the Introduction. You can navigate through the report using the Table of Contents (see sidebar or below).

We warmly welcome comments on the text and will incorporate feedback as we go forward. We also welcome contributions or suggestions for additional sections and areas to examine.

The handbook provides more legal and social than technical guidance but there are other resources for the technical side of open data.

The Open Data Handbook provides a much needed comfort level for government and other data holders.

Open data isn’t something odd or to be feared. It will empower new services, even job creation. The more data that is available, the more connections creative people will find in that data.

Meronymy SPARQL Database Server To Debut With Emphasis on High Performance

Filed under: Linked Data,Meronymy,SPARQL — Patrick Durusau @ 4:47 pm

Meronymy SPARQL Database Server To Debut With Emphasis on High Performance

From the post:

Coming in June from start-up Meronymy is a new RDF enterprise database management system, the Meronymy SPARQL Database Server. The company, founded by Inge Henriksen, began life because of the need he saw for a high-performance and more scalable RDF database server.

The idea to focus on a database server exclusively oriented to Linked Data and the Semantic Web came as a result of Henriksen’s work over the last decade as an IT consultant implementing many semantic solutions for customers in sectors such as government and education. “One issue that always came up was performance,” he explains, especially when performing more advanced SPARQL queries against triple stores using filters, for example.

“Once the data reached a certain size, which it often did very quickly, the size of the data became unmanageable and we had to fall back on caching and the like to resolve these performance issues.” The problem there is that caching isn’t compatible with situations where there is a need for real-time data.

A closed beta is due out soon. Register at Meronymy.

Economist Style Guide

Filed under: Documentation — Patrick Durusau @ 4:47 pm

Economist Style Guide

The Economist’s style guide is back online!

Following this style guide will make your documentation successful. Up to you if your software is as well.

Example of brevity from the style guide:

Passive

Be direct. Use the active tense. A hit B describes the event more concisely than B was hit by A.

Compare The Chicago Manual of Style (15th edition) at a 1/3 of a page or The Hodges Harbrace Handbook (17th edition) at 3 pages. Both have amusing examples and details but sometimes you just want the bare rule.

February 21, 2012

Magic Elephants, Data Psychics, and Invisible Gorillas

Filed under: BigData,Data Mining — Patrick Durusau @ 8:02 pm

Magic Elephants, Data Psychics, and Invisible Gorillas

Jim Harris writes:

A recent Forbes article predicts Big Data will be a $50 billion market by 2017, and Michael Friedenberg recently blogged how the rise of big data is generating buzz about Hadoop (which I call the Magic Elephant): “It certainly looks like the Holy Grail for organizing unstructured data, so it’s no wonder everyone is jumping on this bandwagon. So get ready for Hadoopalooza 2012.”

John Burke recently blogged about the role of big data helping CIOs “figure out how to handle the new, the unusual, and the unexpected as an opportunity to focus more clearly on how to bring new levels of order to their traditional structured data.”

As I have previously blogged, many big data proponents (especially the Big Data Lebowski vendors selling Hadoop solutions) extol its virtues as if big data provides clairvoyant business insight, as if big data was the Data Psychic of the Information Age.

But a recent New York Times article opened with the story of a statistician working for a large retail chain being asked by his marketing colleagues: “If we wanted to figure out if a customer is pregnant, even if she didn’t want us to know, can you do that?” As Eric Siegel of Predictive Analytics World is quoted in the article, “we’re living through a golden age of behavioral research. It’s amazing how much we can figure out about how people think now.”

There are funny moments in this post but the main lesson isn’t humorous.

When reading it, think about how much money your clients are leaving on the table by seeing what they expect to see from search analysis. Could be enough money to make the difference in success or failure. Or perhaps more importantly, being able to continue with your services.

There is a lot of room (I think) for improvement on the technological side of things but there is just as much if not more to be improved on the human engineering side.

The book Jim mentions, Daniel Kahneman’s Thinking, Fast and Slow is just the emerging tip of the iceberg in terms of research that is directly relevant to both marketing and interfaces.

Suggest you get a copy. Not to read and accept, may or may not be right in the details, but the focus is one you cannot afford to ignore.

Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons

Filed under: Pregel,Signal/Collect — Patrick Durusau @ 8:02 pm

Google Pregel vs Signal Collect for distributed Graph Processing – pros and cons

René Pickhardt summarizes two of the papers for tomorrow’s meeting on graph databases:

One of the reading club assignments was to read the paper about Google Pregel and Signal Collect, compare them and point out pros and cons of both approaches.

So after I read both papers as well as Claudios overview on Pregel clones and took some notes here are my thoughts but first a short summary of both papers.

What are your thoughts on these or some of the other readings for tomorrow?

R for Quants, Part III (A)

Filed under: R — Patrick Durusau @ 8:01 pm

R for Quants, Part III (A)

From the post:

This is the third part in a three part series on teaching R to MFE students at CUNY Baruch. The focus of this lesson is on programming methods and application development in R.

Sometimes we all skip references, footnotes, etc., but this is one time to not skip the references!

B. Rowe. A Beautiful Paradigm: Functional Programming in Finance. R/Finance 2011, 2011.

Follow the link to “A Beautiful Paradigm:…” and tell me what you think. Thought provoking if nothing else.

Maps with R

Filed under: Mapping,Maps,R — Patrick Durusau @ 8:01 pm

Maps with R (I)

From the post:

This is the first post of a short series to show some code I have learnt to produce maps with R.

Some time ago I found this infographic from The New York Times (via this page) and I wondered how a multivariate choropleth map could be produced with R. Here is the code I have arranged to show the results of the last Spanish general elections in a similar fashion.

Which was followed by:

Maps with R (II)

In my last post I described how to produce a multivariate choropleth map with R. Now I will show how to create a map from raster files. One of them is a factor which will group the values of the other one. Thus, once again, I will superpose several groups in the same map.

What do you want to map today?

After DuPont bans Teflon from WordNet, the world is their non-sticky oyster

Filed under: Dictionary,Humor — Patrick Durusau @ 8:01 pm

After DuPont bans Teflon from WordNet, the world is their non-sticky oyster

Toma Tasovac reports on DuPont banning the term Teflon from WordNet, but not before observing:

I lived in the United States for more than a decade — long enough to know that litigation is not just a judiciary battle about enforcing legal rights: it’s a way of life. I have also over the years watched with amusement how dictionaries get used in American courtrooms, from Martha Nussbaum’s unfortunate reading of the Liddell-Scott on τόλμημα in Romer vs. Evans in 1993 to a recent case in which Chief Justice John G. Roberts Jr. parsed the meaning of a federal law by consulting no less than five dictionaries: one of the words he focused on was the preposition of. While Martha Nussbaum’s court drama about moral philosophy, scholarly integrity, homosexual desire and the nature of shame would make a great movie (staring, inevitably, as pretty much every other movie out there – Meryl Streep), Chief Justice Roberts’ dreadful, ho-hum lexicographic exercise would barely pass the Judge Judy test of how-low-can-we-go: he discovered that the meaning of of had something to do with belonging or possession. Pass the remote, please!

Who rules/owns our vocabularies?

There are serious issues at stake but take a few minutes to enjoy this post.

UMD CMSC 723: Computational Linguistics I

Filed under: Computational Linguistics — Patrick Durusau @ 8:00 pm

UMD CMSC 723: Computational Linguistics I

Twenty-five (25) posts by Hal Daume III as part of his course on computational linguistics. References, pointers, examples, explanations.

I haven’t read these in detail. As always, welcome your comments/suggestions.

It would be interesting to take the major university computational linguistics courses and create a topic map of the topics covered and recommended resources. Could be useful for students with different learning styles to find an approach that works for them.

Anyone care to hazard a list of say the top twenty (20) schools in computational linguistics? (without ranking, just in the top 20)

PS: The course homepage.

Making sense of Wikipedia categories

Filed under: Annotation,Classification,Wikipedia — Patrick Durusau @ 8:00 pm

Making sense of Wikipedia categories

Hal Daume III writes:

Wikipedia’s category hierarchy forms a graph. It’s definitely cyclic (Category:Ethology belongs to Category:Behavior, which in turn belongs to Category:Ethology).

At any rate, did you know that “Chicago Stags coaches” are a subcategory of “Natural sciences”? If you don’t believe me, go to the Wikipedia entry for the Natural sciences category, and expand the following list of subcategories:

(subcategories omitted)

I guess it kind of makes sense. There are some other fun ones, like “Rhaeto-Romance languages”, “American World War I flying aces” and “1911 films”. Of course, these are all quite deep in the “hierarchy” (all of those are at depth 15 or higher).

Hal examines several strategies and concludes asking:

Has anyone else tried and succeed at using the Wikipedia category structure?

Some other questions:

Is Hal right that hand annotation doesn’t “scale?”

I have heard that more times than I can count but never seen any studies cited to support it.

After all, Wikipedia was manually edited and produced. Yes? No automated process created its content. So, what is the barrier to hand annotation?

If you think about it, the same could be said about email but most email (yes?) is written by hand. Not produced by automated processes (well, except for spam), so why can’t it be hand annotated? Or at least why can’t we capture semantics of email at the point of composition and annotate it there by automated means?

Hand annotation may not scale for sensor data or financial data streams but is hand annotation needed for such sources?

Hand annotation may not scale for say twitter posts by non-English speakers. But only for agencies with very short-sighted if not actively bigoted hiring/contracting practices.

Has anyone loaded the Wikipedia categories into a graph database? What sort of interface would you suggest for trial arrangement of the categories?

PS: If you are interested in discussing how-to establish assisted annotation for twitter, email or other data streams, with or without user awareness, send me a post.

data modelling and FRBR WEMI ontology

Filed under: FRBR,RDF,Sets — Patrick Durusau @ 8:00 pm

data modelling and FRBR WEMI ontology

Jonathan Rochkind writes to defend the FRBR WEMI ontology:

Karen Coyle writes on the RDA listserv:

FRBR claims to be based on a “relational” model, as in “relational database.” That is not tomorrow’s data model; it is yesterday’s, although it is a step toward tomorrow’s model. The difficulty is that FRBR was conceived of in the early 1990′s, and completed in the late 1990′s. That makes it about 15 years old.

I think it would have been just as much a mistake to tie the FRBR model to an RDF model as it would have/was to tie it to a relational database model. Whatever we come up with is going to last us more than 15 years, and things will change again. Now, I’ll admit that I’m heretically still suspicious that an RDF data model will in fact be ‘the future’. But even if it is, there will be another future (or simultaneous futures plural).

And concludes:

I tend to think they should have just gone with ‘set theory’ oriented language, because it is, I think, the most clear, while still being abstract enough to make it harder to think the WEMI ontology is tied to some particular technology like relational databases OR linked data. I think WEMI gets it right regardless of whether you speak in the language of ‘relational’, ‘set theory’, ‘object orientation’ or ‘linked data’/RDF.

Leaving my qualms about RDF to one side, I write to point out that choosing “set theory” is a choice of a particular technology or if you like, tradition.

If that sounds odd, consider how many times you have used set theory in the last week, month, year? Unless you are a logician or introductory mathematics professor, the odds are that the number is zero (0) (or the empty set, {},for any logicians reading this post).

Choosing “set theory” is to choose a methodology that very few people use in practice. The vast majority of people make choices, evaluate outcomes, live complex lives innocent of the use of set theory.

I don’t object to FRBR or other efforts choosing to use “set theory” but recognize it is a minority practice.

One that elevates a minority over the majority of users.

InfiniteGraph – “…Create, Define, Repeat, and Visualize Results in Minutes”

Filed under: Graphs,InfiniteGraph,NoSQL — Patrick Durusau @ 7:59 pm

Objectivity Adds New Plugin Framework, Integrated Visualizer And Support For Tinkerpop Blueprints To InfiniteGraph

From the post:

“Of the numerous varieties of NoSQL databases, graph databases have the potential to significantly alter the analytics sector by enabling companies to unlock value based on understanding and analyzing the relationships between data,” said Matt Aslett, research management, data management and analytics, 451 Research. ”The new additions to Objectivity’s InfiniteGraph enable developers to achieve results in real time and also realize additional value by making the queries repeatable.”

Plugin Framework:
InfiniteGraph’s Plugin Framework provides developers with the ultimate in flexibility and supports the creation, import, and repeated use of plugins that modularize useful functionality. Developers can leverage successful queries, adjust parameters when appropriate, test queries and gain real-time results. A Navigator plugin bundles components that assist in navigation queries, e.g. result qualifiers, path qualifiers, and guides. The Formatter plugin formats and outputs results of graph queries. These plugins can be loaded and used in the InfiniteGraph Visualizer, and reused in InfiniteGraph applications.

Enhanced IG Visualizer:
The Visualizer is now tightly integrated with InfiniteGraph’s Plugin Framework allowing indexing queries for edges and export of GraphML and JSON (built-in) or other user-defined plugin formats. The Visualizer allows users to easily load plugins with enhanced control and navigation. Developers can parameterize plugins to control runtime behavior. Now every part of the graph is fully customizable and delivers a sophisticated result display for each query.

Support for Tinkerpop Blueprints:
InfiniteGraph provides a clean integration with Tinkerpop Blueprints, a popular property graph model interface with provided implementations, and is well-suited for applications that want to traverse and query graph databases using Gremlin.

That’s a bundle of news at one time for sure! The plugin architecture sounds particularly interesting.

Curious if anyone has developed a JDBC that enables access to data in a relational database as a graph?

Riak 1.1 + Webinar

Filed under: NoSQL,Riak — Patrick Durusau @ 7:59 pm

Riak 1.1 Release + Webinar

This post almost didn’t happen. I got an email notice about this release and when I went to the web page “version,” every link pointed to the 29 February 2012 webinar on Riak 1.1. If the term was “webinar” or “Riak 1.1,” multiple times.

So I go to the Basho website, this is big news. Nothing on the blog. There is an image on the homepage if you know which one to choose.

Finally, I went to company -> news -> “Basho Unveils New Graphical Operations Dashboard, Diagnostics with Release of Riak 1.1.”

OK, not the best headline but at least you know you have arrived at the right place.

Tip: Don’t make news about your product or company hard to find. (KISS4S – Keep it simple stupid for stupids)

After getting there I find:

Riak 1.1 boosts data synchronization performance for multi-data center deployments, provides operating system and
installation diagnostics and improves operational control for very large clusters. Riak 1.1 delivers a range of new
features and improvements including:

  • Riak Control, a completely open source and intuitive administrative console for managing, monitoring and interfacing with Riak clusters
  • Riaknostic, an open source, proactive diagnostic suite for detecting common configuration and runtime problems
  • Enhanced error logging and reporting
  • Improved resiliency for large clusters
  • Automatic data compression using the Snappy compression library

Additionally, Riak EDS (Enterprise Data Store), Basho’s commercial
distribution based on Riak, features major enhancements, primarily for multi-data center replication:

  • Introduction of bucket-level replication, adding more granularity and robustness
  • Various distinct data center synchronization options are now available, each optimized for different use cases
  • Significant improvement of data synchronization across multiple data centers

“The 1.1 release is focused on simplifying life for developers and administrators. Basho’s new Riak Control and
Riaknostic components move Riak open source forward, providing an easy and intuitive way to diagnose, manage and
monitor Riak platforms,” said Don Rippert, CEO, Basho. “While Riak Control was originally part of Basho’s commercial
offering, we decided to release the code as part of Riak 1.1 to reinforce our commitment to the open source community.”

The notice was worth hunting for and the release looks very interesting.

As added incentive, you can get free Riak/Basho stickers. I think sew on patches would be good as well. Instead of biker jacket you could have a developer jacket. 😉

February 20, 2012

A beautiful algorithm

Filed under: Algorithms — Patrick Durusau @ 8:36 pm

A beautiful algorithm

Breakthroughs are possible, even for something as well studied as Conway’s Game of Life.

Read the post to be inspired to keep looking for better solutions.

Attention-enhancing information retrieval

Filed under: Information Retrieval,Interface Research/Design,Users — Patrick Durusau @ 8:36 pm

Attention-enhancing information retrieval

William Webber writes:

Last week I was at SWIRL, the occasional talkshop on the future of information retrieval. To me the most important of the presentations was Dianne Kelly’s “Rage against the Machine Learning”, in which she observed the way information retrieval currently works has changed the way people think. In particular, she proposed that the combination of short query with snippet response has reworked peoples’ plastic brains to focus on working memory, and forgo the processing of information required for it to lay its tracks down in our long term memory. In short, it makes us transactionally adept, but stops us from learning.

This is as important as Bret Victor’s presentation.

I particularly liked the line:

Various fanciful scenarios were given, but the ultimate end-point of such a research direction is that you walk into the shopping mall, and then your mobile phone leads you round telling you what to buy.

Reminds me of a line I remember imperfectly as judging from advertising, we are all “…insecure, sex-starved neurotics with 15-second attention spans.”

I always thought that was being generous on the attention span but opinions differ on that point. 😉

How do you envision your users? Serious question but not one you have to answer here. Ask yourself.

Migrating from Oracle to PostgreSQL

Filed under: Oracle,PostgreSQL — Patrick Durusau @ 8:36 pm

Migrating from Oracle to PostgreSQL by Kevin Kempter

From the post:

This video presents Ora2Pg, a free tool that you can use to migrate an Oracle database to a PostgreSQL compatible schema. Ora2Pg connects your Oracle database, scan it automatically and extracts its structure or data, it then generates SQL scripts that you can load into your PostgreSQL database.

Ora2Pg can be used from reverse engineering Oracle database for database migration or to replicate Oracle data into a PostgreSQL database. The video shows where to download it and talks about the prerequisites. It explains how to install Ora2Pg and configure it. At the end, it presents some examples of ora2pg being used.

Like the man says, useful for migration or replication.

What I wonder about is the day in the not too distant future when “migration” isn’t a meaningful term. Either because the data is too large or dynamic for “migration” to be meaningful. Not to mention the inevitable dangers of corruption during “migration.”

And if you think about it, isn’t the database engine, Oracle or PostgreSQL simply a way to access data already stored? If I want to use a different engine to access the same data, what is the difficulty?

I would much rather design a topic map that queries “Oracle” data in place, either using an Oracle interface or even directly than to “migrate” the data with all the hazards and dangers that brings.

Will be interesting if the “cloud” results in data storage separate from application interfaces. Much like we all use TCP/IP for network traffic, although the packets are put to different purposes by different applications.

Inventing on Principle

Filed under: Authoring Topic Maps,Graphics,Visualization — Patrick Durusau @ 8:36 pm

Inventing on Principle by Bret Victor.

Nathan Yau at Flowing Data writes:

This talk by Bret Victor caught fire a few days ago, but I just got a chance to watch to it in its entirety. It’s worth the one hour. Victor demos some great looking software that connects code to the visual, making the creation process more visceral, and he finishes up with worthwhile thoughts on the invention process.

Think about authoring a graph or topic map with the sort of immediate feedback that Bret demonstrates.

Social Media Application (FBI RFI)

Filed under: Data Mining,RFI-RFP,Social Media — Patrick Durusau @ 8:35 pm

Social Media Application (FBI RFI)

Current Due Date: 11:00 AM, March 13, 2012

You have to read the Social Media Application.pdf document to prepare a response.

Be aware that as of 20 February 2012, that document has a blank page every other page. I suspect it is the complete document but have written to confirm and to request a corrected document be posted.

Out-Hoover Hoover: FBI wants massive data-mining capability for social media does mention:

Nowhere in this detailed RFI, however, does the FBI ask industry to comment on the privacy implications of such massive data collection and storage of social media sites. Nor does the FBI say how it would define the “bad actors” who would be subjected this type of scrutiny.

I take that to mean that the FBI is not seeking your comments on privacy implications or possible definitions of “bad actors.”

I won’t be able to prepare an official response because I don’t meet the contractor suitability requirements, which include a cost estimate for an offsite server as a solution to the requirements.

I will be going over the requirements and publishing my response here as though I meet the contractor suitability requirements. Could be an interesting exercise.

« Newer PostsOlder Posts »

Powered by WordPress