Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 9, 2010

Basho Riak: An Open Source Scalable Data Store

Filed under: MapReduce,NoSQL,Riak — Patrick Durusau @ 5:45 pm

Basho Riak: An Open Source Scalable Data Store

From the website:

Riak is a Dynamo-inspired key/value store that scales predictably and easily. Riak also simplifies development by giving developers the ability to quickly prototype, test, and deploy their applications

A truly fault-tolerant system, Riak has no single point of failure. No machines are special or central in Riak, so developers and operations professionals can decide exactly how fault-tolerant they want and need their applications to be.

The video from Ga Tech NoSQL conference in 2009 is worth watching.

Their implementation of MapReduce: is targeted (doesn’t have to be run against entire data set), can be setup as a stream (store and send through mapreduce), or used with the representation of relationships as links.

December 6, 2010

GT.M High end TP database engine

Filed under: Data Structures,GT.M,node-js,NoSQL,Software — Patrick Durusau @ 4:55 am

GT.M High end TP database engine (Sourceforge)

Description from the commercial version:

The GT.M data model is a hierarchical associative memory (i.e., multi-dimensional array) that imposes no restrictions on the data types of the indexes and the content – the application logic can impose any schema, dictionary or data organization suited to its problem domain.* GT.M’s compiler for the standard M (also known as MUMPS) scripting language implements full support for ACID (Atomic, Consistent, Isolated, Durable) transactions, using optimistic concurrency control and software transactional memory (STM) that resolves the common mismatch between databases and programming languages. Its unique ability to create and deploy logical multi-site configurations of applications provides unrivaled continuity of business in the face of not just unplanned events, but also planned events, including planned events that include changes to application logic and schema.

There are clients for node-js:

http://github.com/robtweed/node-mdbm
http://github.com/robtweed/node-mwire

Local topic map software is interesting and useful but for scaling to the enterprise level, something different is going to be required.

Reports of implementing the TMDM or other topic map legends with a GT.M based system are welcome.

December 1, 2010

*Sparsity technologies – vendor

Filed under: Graphs,NoSQL — Patrick Durusau @ 2:43 pm

*Sparsity technologies

I encountered this site on a nosql database mailing list.

Of particular interest is their *dex graph database.

From the website:

A DEX graph is a Labeled Directed Attributed Multigraph. Labeled because nodes and edges in a graph belong to types. Directed because it supports directed edges as well as undirected. An attributed graph allows a variable list of attributes for each node and edge, where an attribute is a value associated to a name, simplifying the graph structure. A multigraph allows multiple edges between two nodes. This means that two nodes can be connected several times by different edges, even if two edges have the same tail, head and label.

There is a free non-commercial use version that allows up to a million nodes and unlimited edges.

I haven’t looked at it yet nor do I have any relationship with the company. I mention it as an FYI item for the moment.

I will be suggesting to them that topic maps would allow them to be a little more specific than: Allows to bring together content from multiple sources.

November 14, 2010

Orient: The Database For The Web – Presentation

Filed under: NoSQL,OrientDB,Software — Patrick Durusau @ 9:02 am

Orient: The Database For The Web

Nice slide deck if you need something for the company CTO.

Perhaps to justify a NOSQL conference or further investigation into NOSQL as an option.

I was deeply amused by slide 19’s claim of “Ø Config.”

Maybe true if I am running it on my laptop during a conference presentation.

A bit more thought required for use in or with a topic map system.

Orient is an impressive bit of software and is likely to be used or encountered by topic mappers.

Questions:

  1. Uses of OrientDB in library contexts? (3-5 pages, citations/links)
  2. Download and install OrientDB. How do you evaluate it’s claim of “Ø Config?” (3-5 pages, no citations)
  3. Extra credit: As librarians you will be asked to evaluate vendor claims about software. Develop a finding aid on software evaluation for librarians faced with that task. (3-5 pages, citations)

November 10, 2010

MongoDB Indexes and Indexing – Post

Filed under: Indexing,MongoDB,NoSQL — Patrick Durusau @ 2:25 pm

MongoDB Indexes and Indexing and MongoDB Indexing: An Optimization Primer from Alex Popescu provide great coverage of indexing and indexing issues.

Funny how topic maps started with indexing, revolve around the semantic issues of indexes/indexing and have to rely on indexing for reasonable performance.

Will have to see what other indexing resources I can dig up.

Enjoy the videos!

OpenTSDB

Filed under: HBase,NoSQL — Patrick Durusau @ 12:28 pm

OpenTSDB

From the website:

OpenTSDB is a distributed, scalable Time Series Database (TSDB) written on top of HBase. OpenTSDB was written to address a common need: store, index and serve metrics collected from computer systems (network gear, operating systems, applications) at a large scale, and make this data easily accessible and graphable.

Thanks to HBase’s scalability, OpenTSDB allows you to collect many thousands of metrics from thousands of hosts and applications, at a high rate (every few seconds). OpenTSDB will never delete or downsample data and can easily store billions of data points. As a matter of fact, StumbleUpon uses it to keep track of hundred of thousands of time series and collects over 100 million data points per day in their main production cluster.

Imagine having the ability to quickly plot a graph showing the number of active worker threads in your web servers, the number of threads used by your database, and correlate this with your service’s latency (example below). OpenTSDB makes generating such graphs on the fly a trivial operation, while manipulating millions of data point for very fine grained, real-time monitoring.

Imagine how a busy sysadmin would react if those metrics were endowed with subject identity and participated in associations with system documentation.

Or metrics of a power distribution center had subject identity so they could tie into multiple emergency/maintenance networks?

Subjects are cheap, subject identity is useful.
(maybe I should make that my tag line, comments?)

***
I first saw this at OpenTSDB: A HBase Scalable Time Series Database by Alex Popescu

November 7, 2010

NoSQL Solution: Evaluation Guide [CHART]

Filed under: NoSQL — Patrick Durusau @ 8:24 pm

NoSQL Solution: Evaluation Guide [CHART]

As the post says, this is hype, but it may be useful hype to read.

What caught my eye was one of the contenders being described as “extremely fast on small data sets (below 20 million rows)….”

OK, that’s not suitable for enterprise purposes but there are a lot of applications that can fit in under a 20 million row limit.

It’s a fun read so let me know what you think about it.

October 9, 2010

BigTable Model with Cassandra and HBase – Post

Filed under: Cassandra,HBase,NoSQL — Patrick Durusau @ 6:29 am

BigTable Model with Cassandra and HBase Non-hand-waving explanation of Cassandra and HBase.

Has anyone tried to column of values approach where subjectIdentifier or subjectLocator is a set of values?

October 8, 2010

Inside Neo4j: Intro and roadmap

Filed under: Graphs,Neo4j,NoSQL,Software — Patrick Durusau @ 6:07 am

Inside Neo4j: Intro and roadmap

Chris Gioran has started a series of posts at A Digital Stain covering the internals of Neo4j.

Whether you are interested in Neo4j in particular or graph databases in general, this is a series of posts to watch closely.

September 17, 2010

Tutorial: Getting Started With Cassandra – Post

Filed under: Cassandra,NoSQL — Patrick Durusau @ 4:42 am

Tutorial: Getting Started With Cassandra via Alex Popescu.

Jack Park says I should read about super columns and key/value pairs in Cassandra. This looks like a good starting place.

September 14, 2010

Redis Snippet for Storing the Social Graph – Post

Filed under: NoSQL,Subject Identity — Patrick Durusau @ 3:47 am

Redis Snippet for Storing the Social Graph from Alex Popescu, a snippet on storing relationships for a social graph using Redis.

Relationships are just a step away (representationally speaking) from associations. Worth a look.

September 13, 2010

Key-Value Pairs

Filed under: NoSQL,Subject Identity,TMRM — Patrick Durusau @ 7:33 am

The Topic Map Reference Model can’t claim to have invented the key/value view of the world.

But it is interesting how much traction key/value pair approaches have been getting of late. From NoSQL in general to Neo4j and Redis in particular. (no offense to other NoSQL contenders, those are the two that came to mind)

Declare which key/value pairs identify a subject and you are on your way towards a subject-centric view of computing.

OK, there are some details but declaring how you identify a subject is the first step in enabling others to reliably identify the same subject.

September 8, 2010

CouchDB: Sell it to Your Boss – Post

Filed under: Graphs,NoSQL,Software — Patrick Durusau @ 8:49 am


CouchDB: Sell it to Your Boss
from Alex Popescu.

CouchDB is one of the many options in the NoSQL world. As a distributed document repository, it is of interest to users of topic maps with document stores. It is written in Erlang, a language for distributed applications, including topic maps.

July 30, 2010

Neo4j 1.1 Released!

Filed under: NoSQL,Software — Patrick Durusau @ 1:39 pm

Neo4j 1.1 has arrived!

From Peter Neubauer’s blog entry:

The Neo4j graph database release 1.1 has just arrived, so here’s some information on the new things that have been included. The main points are the additions of monitoring support, an event framework and a new traversal framework to the kernel. Then two useful components have been added to the default distribution (called “Apoc”): graph algorithms and online backup.

Peter’s post has pointers to other Neo4j resources.

July 28, 2010

Django and Neo4j – Domain modeling that kicks ass – Post

Filed under: NoSQL — Patrick Durusau @ 8:25 pm

Django and Neo4j – Domain modeling that kicks ass.

Derek Stainer covers some licensing and performance numbers for Neo4J before turning it over to a presentation by Tobias Ivarsson.

High marks as a great introduction to Neo4J!

July 26, 2010

Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Recommendation

Filed under: Graphs,Mapping,NoSQL — Patrick Durusau @ 7:10 am

Problem-Solving using Graph Traversals: Searching, Scoring, Ranking, and Recommendation is a must see!

Set Theory Symbols, if you need help with the symbols.

Looking forward to seeing topic map operations illustrated on a graph with the Gremlin character. (see the slides)

By Marko A. Rodriguez.


Edited to add author’s name. Post did not appear in a search as expected.

July 22, 2010

Introduction to Cassandra – Post

Filed under: NoSQL,Software — Patrick Durusau @ 3:26 pm

Introduction to Cassandra showed up on myNoSQL today with a nice set of further reading links on Cassandra.

Would a listing of resources on graph query languages be helpful to anyone preparing to discuss TMQL in Leipzig?

Lily – the Scalable NoSQL Content Repository

Filed under: HBase,NoSQL,Solr — Patrick Durusau @ 7:02 am

Lily – the Scalable NoSQL Content Repository

A product prior to customers. What a marketing concept!

Sarcasm to one side, this is a significant development for scalable content storage using NoSQL and for topic maps.

The more data stored in Lily the less findable it will be, particularly across vocabularies.

Traditional blind mappings will work but they will also remain impervious to reliable sharing/scaling.

Topic maps need not be embedded in data storage applications but that could be a key marketing point for some customers. Something to keep in mind while evaluating Lily.

July 19, 2010

myNoSQL

Filed under: NoSQL — Patrick Durusau @ 1:47 pm

myNoSQL is maintained by Alex Popescu bills itself as

The Hello magazine of the NoSQL World

That may well be true. A wealth of useful resources and current news on NoSQL.

July 18, 2010

Graph Traversal Programming Pattern (Part 1) – Graph Structures – Post

Filed under: Graphs,NoSQL — Patrick Durusau @ 7:26 am

Graph Traversal Programming Pattern (Part 1) – Graph Structures, by Derek Stainer is the start of a discussion of the Graph Traversal Programming Pattern presentation by Marko A. Rodriguez.

Starts with an primer on graphs. Next installment is on graph databases. Worth following.

You may also like the explanation of property graphs that is part of the Gremlin documentation.

June 4, 2010

Tinkerpop

Filed under: Graphs,NoSQL,Semantic Web,Software — Patrick Durusau @ 3:58 pm

Tinkerpop is worth a visit, whether you are into graph software (its focus) or not.

Home for:

Pipes: A Data Flow Framework Using Process Graphs

reXster: A Graph Based Ranking Engine

Blueprints (…collection of interfaces and implementations to common, complex data structures.)

Project Gargamel: Distributed Graph Computing

Gremlin: A Graph Based Programming Language

Twitlogic: Real Time #SemanticWeb in <= 140 Chars

Ripple: Semantic Web Scripting Language

LoPSideD: Implementing The Linked Process Protocol

May 27, 2010

OrientDB

Filed under: NoSQL — Patrick Durusau @ 6:44 pm

OrientDB is a NoSQL database.

The performance and scaling numbers are nothing short of amazing.

A couple of early comments:

Getting Oriented with OrientDB.

OrientDB: A new Open Source NoSQL DBMS (google.com)

Caution: While I favor exploration of new data structures and technologies, only a limited amount of data will ever be available in any one structure. Even the reputed 102 billion items in the Amazon servers represent only part of the information available about those items.

I remain a fan of Barta’s virtual topic maps that are composed from disparate data sources.

May 26, 2010

NoSQL Summer

Filed under: NoSQL — Patrick Durusau @ 12:41 pm

NoSQL Summer

If you enjoyed summer reading club at the library as a child, this is the summer reading program for you!

Nine cities are already forming reading clubs for a papers that cover from “Access Path Selection in an RDBMS” by P. Griffiths Selinger & al., to “Google’s BigTable” by by Fay Chang & al.

April 16, 2010

Thesis – Sharding the Neo4J Graph DB

Filed under: NoSQL,Topic Map Software — Patrick Durusau @ 12:23 pm

Sharding the Neo4J Graph DB thesis bears watching.

As the size of topic maps increase, so will the performance demands made upon them.

« Newer Posts

Powered by WordPress