Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 31, 2011

Tutorial: Developing in Erlang with Webmachine, ErlyDTL, and Riak

Filed under: Erlang,NoSQL,Riak — Patrick Durusau @ 7:08 am

Tutorial: Developing in Erlang with Webmachine, ErlyDTL, and Riak

From Alex Popescu’s MyNoSQL blog:

  • Part 1
    • In Part 1 of the series we covered the basics of getting the development environment up and running. We also looked at how to get a really simple ErlyDTL template rendering
  • Part 2
    • There are a few reasons this series is targeting this technology stack. One of them is uptime. We’re aiming to build a site that stays up as much as possible. Given that, one of the things that I missed in the previous post was setting up a load balancer. Hence this post will attempt to fill that gap.
  • Part 3 In this post we’re going to cover:
    • A slight refactor of code structure to support the “standard” approach to building applications in Erlang using OTP.
    • Building a small set of modules to talk to Riak.
    • Creation of some JSON helper functions for reading and writing data.
    • Calling all the way from the Webmachine front-end to Riak to extract data and display it in a browser using ErlyDTL templates.

Erlang is important for anyone building high availability (think telecommunications) systems that can be dynamically reconfigured without taking the systems offline.

January 28, 2011

CouchDB 1.0.2: 3rd is Lucky – Post

Filed under: CouchDB,NoSQL — Patrick Durusau @ 9:45 am

CouchDB 1.0.2: 3rd is Lucky

Alex Popescu covers the release of CouchDB 1.0.2.

A point release with new features.

Alchemy Database: A Hybrid Relational-Database/NOSQL-Datastore

Filed under: Alchemy Database,NoSQL — Patrick Durusau @ 7:59 am

Alchemy Database: A Hybrid Relational-Database/NOSQL-Datastore

From the website:

Alchemy Database is a lightweight SQL server that is built on top of the NOSQL datastore redis. It supports redis data-structures and redis commands and supports (de)normalisation of these data structures (lists,sets,hash-tables) to/from SQL tables. Lua is deeply embedded and lua scripts can be run internally on Alchemy’s data objects. Alchemy Database is not only a data storage Swiss Army Knife, it is also blazingly fast and extremely memory efficient.

  • Speed is achieved by being an event driven network server that stores ALL data in RAM and achieves disk persistence by using a spare cpu-core to periodically log data changes (i.e. no threads, no locks, no undo-logs, no disk-seeks, serving data over a network at RAM speed)
  • Storage data structures w/ very low memory overhead and data compression, via algorithms w/ insignificant performance hits, greatly increase the amount of data you can fit in RAM
  • Optimising to the SQL statements most commonly used in OLTP workloads yields a lightweight SQL server designed for low latency at high concurrency (i.e. mindblowing speed).

The Philosophy of Alchemy Database is that RAM is now affordable enough to be able to store ENTIRE OLTP Databases in a single machine’s RAM (e.g. Wikipedia’s DB was 50GB in 2009 and a Dell PowerEdge R415 w/ 64GB RAM costs $4000), as long as the data is made persistent to disk. So Alchemy Database provides a non-blocking event-driven network-I/O-based relational-database, with very little memory overhead, that does the most common OLTP SQL statements amazingly fast and then throws in the NOSQL Data-store redis to create fantastic optimisation possibilities.

Leaving words/phrases like, blazingly fast, amazingly fast, fantastic optimisation, mindblowing speed, to one side, one does wonder how it performs for a topic map?

Reports welcome!

January 27, 2011

Comet – An Example of the New Key-Code Databases – Post

Filed under: NoSQL — Patrick Durusau @ 2:38 pm

Comet – An Example of the New Key-Code Databases

Another NoSQL database.

The post summaries the goals of Comet, which is described as: … an extensible storage service that allows clients to inject snippets of code that control their data’s behavior inside the storage service.

One thing you will notice fairly quickly when reading Comet: An active distributed key-value store is that the authors were not trying to build a fully generalized solution.

They had specific requirements in mind to be met and if your needs fall outside those requirements, you need to look elsewhere.

Rather refreshing to find a project that expressly isn’t trying to replace MS Office or Facebook. 😉

That still leaves a lot of interesting and commercially successful work to be done.

How Sharding Works – Presentation – 4 Feb. 2010

Filed under: NoSQL,Sharding,Topic Maps — Patrick Durusau @ 7:54 am

How Sharding Works, a presentation by Kristina Chodorow, author of MongoDB: The Definitive Guide.

Date: 4 Feb. 2010

Register

Of interest to topic maps that partition topics.

I thought last night after I wrote a draft of this post that sharding would interfere with arbitrary merging of any topic with any other topic.

OK, so what was the question?

True, sharding will make merging of arbitrary topics in a topic map more costly (if possible at all) but how often is completely unconstrained merging an actual requirement?

I suspect that most topic map projects, other than theoretical ones, already know what merging they are interested in and how those subjects are going to be identified.

Allowances for additional identifications of subjects should be made but that is a matter of careful design of your topic map.

Suggestion: Have merging specified just like any other requirement. What is expected? What is the criteria for success? What allowances need to be made for future expansion?

January 26, 2011

Dimensions to use to compare NoSQL data stores – Queries to Produce Topic Maps

Filed under: Merging,NoSQL,TMDM,Topic Map Software,Topic Maps — Patrick Durusau @ 9:08 am

Dimensions to use to compare NoSQL data stores

A post by Huan Liu to read after Billy Newport’s Enterprise NoSQL: Silver Bullet or Poison Pill? – (Unique Questions?)

A very good quick summary of the dimension to consider. As Liu makes clear, choosing the right data store is a complex issue.

I would use this as an overview article to get everyone on a common ground for a discussion of NoSQL data stores.

At least that way, misunderstandings will be on some other topic of discussion.

BTW, if you think about Newport’s point (however correct/incorrect) that NoSQL databases enable only one query, doesn’t that fit the production of a topic map?

That is there is a defined set of constructs, with defined conditions of equivalence. So the only query in that regard has been fixed.

Questions remain about querying the data that a topic map holds, but the query that results in merged topics, associations, etc.

In some processing models, that query is performed and a merged artifact is produced.

Following the same data model rules, I would prefer to allow those queries be made on an ad hoc basis. So that users are always presented with the latest merged results.

Same rules as the TMDM, just a question of when they fire.

Questions:

  1. NoSQL – What other general compare/dimension articles would you recommend as common ground builders? (1-3 citations)
  2. Topic maps as artifacts – What other data processing approaches produce static artifacts for querying? (3-5 pages, citations)
  3. Topic maps as query results – What are the concerns and benefits of topic maps as query results? (3-5 pages, citations)

SQLShell. A Cross-Database SQL Tool With NoSQL Potential

Filed under: NoSQL,SQL — Patrick Durusau @ 7:20 am

SQLShell. A Cross-Database SQL Tool With NoSQL Potential

From the website:

In this blog post I will introduce SQLShell and demonstrate, step-by-step, how to install it and start using it with MySQL. I will also reflect on the possibilites of using this with NoSQL technologies, such as HBase, MongoDB, Hive, CouchDB, Redis and Google BigQuery.

SQLShell is a cross-platform, cross-database command-line tool for SQL, much like psql for PostgreSQL or the mysql command-line tool for MySQL.

Discovers that JDBC drivers have not yet developed to the point where a common interface can be demonstrated.

It is only a matter of time until they do improve and tools such as SQLShell will be important for data exploration and harvesting.

Enterprise NoSQL: Silver Bullet or Poison Pill? – (Unique Questions?)

Filed under: NoSQL,SQL — Patrick Durusau @ 7:03 am

Enterprise NoSQL: Silver Bullet or Poison Pill? a presentation by Billy Newport (IBM).

Very informative comparison between SQL and NoSQL mindsets and what considerations lead to one or the other.

The “ah-ha” point in the presentation was Newport saying that for NoSQL, one has to ask what question do you want to have answered?

I am not entirely convinced by Newport’s argument that SQL supports arbitrary queries and that NoSQL design of necessity supports only a single query robustly.

Granting there are design choices that can point a NoSQL designer into a corner, but I don’t think it is fair to assume all NoSQL designers will make the same mistakes.

Or even that all NoSQL solutions obtain such limitations.

I don’t know of anything inherently query limiting about a graph database or even a hypergraph database architecture.

If you quickly point out sharding and it driving design to answer a particular question, my response is: And your question is?

How many arbitrary questions do you think there are for any given data set?

That would be an interesting research question.

How many unique questions (not queries) are asked of the average data set?

That is: unique queries != unique questions.

Application designers can design queries to match their application logic but that isn’t the same thing as a unique question.

Is that Newport’s concern (or at least part of it)? That NoSQL may put limits on the design of application logic? That could be good or bad.

January 25, 2011

Yet another MongoDB Map Reduce tutorial – Post

Filed under: MapReduce,MongoDB,NoSQL — Patrick Durusau @ 6:32 am

Yet another MongoDB Map Reduce tutorial

From the post:

As the title says, this is yet-another-tutorial on Map Reduce using MongoDB. But two things that are different here:

1. A problem solving approach is used, so we’ll take a problem, solve it in SQL first and then discuss Map Reduce.

2. Lots of diagrams, so you’ll hopefully better understand how Map Reduce works.

First noticed on Alex Popescu’s myNoSQL blog.

January 24, 2011

Cassandra – New Release

Filed under: Cassandra,NoSQL — Patrick Durusau @ 6:28 am

Cassandra – 0.70 released 2011-01-09.

Homepage reports largest production version has 100 terabytes of data in over 150 machines.

Sounds like a candidate for topic maps. Yes? 😉

January 23, 2011

Rogue – Some Details

Filed under: MongoDB,NoSQL — Patrick Durusau @ 1:41 pm

Rogue: A Type-Safe Scala DSL for querying MongoDB a blog post from Foursquare that gives some examples and details on Rogue. More to follow.

January 22, 2011

Advanced HBase – Post

Filed under: HBase,NoSQL — Patrick Durusau @ 7:14 pm

Advanced HBase by Lars George from Alex Popescu’s MyNoSQL blog.

January 20, 2011

HBase 0.90.0 Released: Over 1000 Fixes and Improvements – Post

Filed under: HBase,NoSQL — Patrick Durusau @ 6:21 am

HBase 0.90.0 Released: Over 1000 Fixes and Improvements

From Alex Popescu news that HBase 0.90.0 has been released!

HBase homepage

January 15, 2011

Membase and Erlang with Matt Ingenthron

Filed under: Erlang,Membase,NoSQL — Patrick Durusau @ 5:39 pm

Membase and Erlang with Matt Ingenthron

From Alex Popescu’s MyNoSQL.

The video is fairly poor in terms of seeing the slides. The presentation is worthwhile but be aware that it is more audio than video.

Recommend that you catch Matt Ingenthron’s blog, or other Membase blogs for more information.

Erlang is important for topic maps due to its built in support for concurrency and for live patching of systems in operation.

For further information see Erlang

How to Choose a Shard Key: The Card Game

Filed under: MongoDB,NoSQL — Patrick Durusau @ 2:38 pm

How to Choose a Shard Key: The Card Game Kristina Chodorow’s highly entertaining post on how to evaluate sharding strategies.

I mention this because sharding is likely to become an issue as top map applications grow in size. Evaluating strategies before significant development time and effort are invested is always a good idea.

I also have a weakness for clever explanations that capture the essence of a complex problem in an accessible form.

If you are at all interested in MongoDB, see the rest of her blog entries.

Or, MongoDB: The Definitive Guide by Kristina and Michael Dirolf.

Scaling with MongoDB Video

Filed under: MongoDB,NoSQL — Patrick Durusau @ 2:27 pm

Scaling with MongoDB Video

Kristina Chodorow covers scaling with the MongoDB. Mentioned on Alexander Popescu’s MyNoSQL blog.

Alexander is concerned about the complexity of the autosharding solution.

But high availability requires more than understanding the capabilities of a single database solution.

A firm understanding of the concerns in Philip A Bernstein and Eric Newcomer’s Principles of Transaction Processing and Jim Gray and Andreas Reuter’s Transaction Processing: Concepts and Techniques is a good starting point.

Whether you are planning high availability for a topic map or another application.

January 14, 2011

MongoSV 2010

Filed under: MongoDB,NoSQL — Patrick Durusau @ 5:43 pm

MongoSV 2010

One day event on the MongoDB database and its uses.

Reported by Alexander Popescu.

I report it here so you can start working your own way through the four tracks for items of interest.

I am going to do the same and pull out items that strike me as particularly relevant to topic maps.

Feel free to post your own suggestions for must see items.

MongoDB and Eventbrite’s Social Graph – Post

Filed under: MongoDB,NoSQL — Patrick Durusau @ 5:34 pm

MongoDB and Eventbrite’s Social Graph

Via Alexander Popescu.

Read the post, then grab the slides:

Eventbrite Social Graph slides

This is very cool!

Redis Under the Hood

Filed under: NoSQL,Redis,Topic Maps — Patrick Durusau @ 5:25 pm

Redis Under the Hood

Via Alexander Popescu’s MyNoSQL blog.

Compelling examination of how Redis works. Some the diagrams are look a lot like diagramming subjects and relationships between them.

Take a look and judge for yourself.

Suggests 1,002 uses for topic maps doesn’t it?

Anyway, if you want to get into the internals of Redis, either for the sheer discipline of it or because you think it may figure in your topic map future, here is a good place to start.

NoSQL benchmarks and performance evaluations – Post

Filed under: NoSQL — Patrick Durusau @ 6:52 am

NoSQL benchmarks and performance evaluations

From Alex Popescu’s MyNoSQL blog, a gathering of NoSQL evaluations.

Used with caution this could be useful information.

January 13, 2011

Introduction to MongoDB

Filed under: MongoDB,NoSQL — Patrick Durusau @ 1:12 pm

Introduction to MongoDB by Justin Jenkins.

Maybe I am just getting jaded or tired, perhaps both, but the hello world examples in introductions has worn thin.

It can be used to illustrate elementary operations, very elementary operations, but once you have seen one elementary operation, you have seen them all.

Just once I would like to see a code to crack passwords protecting the WhiteHouse switchboard or script for real time TCP/IP packet replacement type example. Maybe not exactly those but you get the drift.*

Something with some bite to it.

Perhaps in addition to the hello world examples.

The introduction by Jenkins is serviceable enough, but for real details, see: MongoDB, the MongoDB homesite.

*****
* The equivalent for topic maps would be an example of how to make leaked information dangerous rather than simply annoying.

For example, a topic map could merge currently secret (or public) information about an individual to assist in the evaluation of a leak. Or to decide on how to exploit it. Without every analyst having to dig up the same information.

January 10, 2011

NoSQL Tapes

Filed under: Cassandra,CouchDB,Graphs,MongoDB,Neo4j,Networks,NoSQL,OrientDB,Social Networks — Patrick Durusau @ 1:33 pm

NoSQL Tapes: A filmed compilation of interviews, explanations & case studies

From the email announcement by Tim Anglade:

Late last year, as the NOSQL Summer drew to a close, I got the itch to start another NOSQL community project. So, with the help of vendors Scality and InfiniteGraph, I toured around the world for 77 days to meet and record video interviews with 40+ NOSQL vendors, users and dudes-you-can-trust.

….

My original goals were to attempt to map a comprehensive view of the NOSQL world, its origins, its current trends and potential future. NOSQL knowledge seemed to me to be heavily fragmented and hard to reconcile across projects, vendors & opinions. I wanted to try to foster more sharing in our community and figure out what people thought ‘NOSQL’ meant. As it happens, I ended up learning quite a lot in the process (as I’m sure even seasoned NOSQLers on this list will too).

I’d like to take this opportunity to thank everybody who agreed to participate in this series: 10gen, Basho, Cloudant, CouchOne, FourSquare, Ben Black, RethinkDB, MarkLogic, Cloudera, SimpleGeo, LinkedIn, Membase, Ryan Rawson, Cliff Moon, Gemini Mobile, Furuhashi-san, Luca Garulli, Sergio Bossa, Mathias Meyer, Wooga, Neo4J, Acunu (and a few other special guests I’m keeping under wraps for now); I couldn’t have done it without them and learned by leaps & bounds for every hour I spent with each of them.

I’d also like to thank my two sponsors, Scality & InfiniteGraph, from the bottom of my heart. They were supportive in a way I didn’t think companies could be and let me total control of the shape & content of the project. I’d encourage you to check them out if you haven’t done so already.

As always, I’ll be glad to take any comments or suggestions you may have either by email (tim@nosqltapes.com) or on Twitter (@timanglade).

Simply awesome!

December 31, 2010

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison – Post

Filed under: Cassandra,CouchDB,HBase,NoSQL,Redis,Riak — Patrick Durusau @ 11:01 am

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison

Not enough detail for decision making but a useful overview nonetheless.

December 30, 2010

Neo4J 1.2 – Released!

Filed under: Associations,Neo4j,NoSQL — Patrick Durusau @ 7:13 pm

Neo4J 1.2 Released!

New features:

  • The Neo4j Server

    The Neo4j standalone server builds upon the RESTful API that was pre-released for Neo4j 1.1. The server provides a complete stand alone Neo4j graph database experience, making it easy to access Neo4j from any programming language or platform. Some of you have already provided great client libraries for languages such as Python, Ruby, PHP, the .Net stack and more. Links and further information about client libraries can be found at: http://www.delicious.com/neo4j/drivers

  • Neo4j High Availability

    The High Availability feature of Neo4j provides an easy way to set up a cluster of graph databases. This allows for read scalability and tolerates faults in any of the participating machines. Writes are allowed to any machine, but are synchronized with a slight delay across all of them.

    High Availability in Neo4j is still in quite an early stage of its evolution and thus still have a few limitations. While it provides scalability for read load, write operations are slightly slower. Adding new machines to a cluster still requires some manual work, and very large transactions cannot be transmitted across machines. These limitations will be addressed in the next version of Neo4j.

  • Some other noteworthy changes include:
    • Additional services for the Neo4j kernel can now be loaded during startup, or injected into a running instance. Examples of such additional services are the Neo4j shell server and the Neo4j management interface.
    • Memory footprint and read performance has been improved.
    • A new cache implementation has been added for high load, low latency workloads.
    • A new index API has been added that is more tightly integrated with the database. This new index API supports indexing relationships as well as nodes, and also supports indexing and querying multiple properties for each node or relationship. The old index API has been deprecated but remains available and will continue to receive bug fixes for a while.
    • The Neo4j shell supports performing path algorithm queries.
    • Built in automatic feedback to improve future versions of Neo4j. See: http://blog.neo4j.org/2010/10/announcing-neo4j-12m01-and-focus-on.html

Let me repeat part of that:

This new index API supports indexing relationships as well as nodes, and also supports indexing and querying multiple properties for each node or relationship.

Will be looking at the details on the indexing, more comments to follow.

December 22, 2010

HyperGraphDB – Data Management for Complex Systems

Filed under: Hypergraphs,NoSQL — Patrick Durusau @ 1:36 pm

HyperGraphDB – Data Management for Complex Systems Author: Borislav Iordanov

Presentation on the architecture of HyperGraphDB.

Slides and MP3 file are available at the presentation link.

Covers the architecture of HyperGraphDB in just under 20 minutes.

Good for an overview but I would suggest looking at the documentation, etc. for a more detailed view.

The documentation describes its topic map component in part as:

In HGTM, all topic maps constructs are represented as HGDB atoms. The Java classes implementing those atoms are in the package org.hypergraphdb.apps.tm. The API is an almost complete implementation of the 1.0 specification. Everything except merging is implementing. Merging wouldn’t be hard, but I haven’t found the need for it yet.

I will be following up with the HyperGraphDB project on how merging was understood.

Will report back on what comes of that discussion.

December 13, 2010

OrientDB 0.9.24

Filed under: NoSQL,OrientDB — Patrick Durusau @ 7:10 am

OrientDB 0.9.24 has been released! Direct download: http://orient.googlecode.com/files/orientdb-0.9.24.zip

Issues fixed: http://code.google.com/p/orient/issues/list?can=1&q=label:v0.9.24

Features for 0.9.25 (Jan. 2010): http://code.google.com/p/orient/issues/list?q=label:v0.9.25

To suggest a new feature: http://code.google.com/p/orient/issues/entry?template=New%20feature

December 12, 2010

Krati – A persistent high-performance data store

Filed under: Data Structures,NoSQL — Patrick Durusau @ 5:53 pm

Krati – A persistent high-performance data store

From the website:

Krati is a simple persistent data store with very low latency and high throughput. It is designed for easy integration with read-write-intensive applications with little effort in tuning configuration, performance and JVM garbage collection….

Simply put, Krati

  • supports varying-length data array
  • supports key-value data store access
  • performss append-only writes in batches
  • has write-ahead redo logs and periodic checkpointing
  • has automatic data compaction (i.e. garbage collection)
  • is memory-resident (or OS page cache resident) yet persistent
  • allows single-writer and multiple readers

Or you can think of Krati as

  • Berkeley DB JE backed by hash-based indexing rather than B-tree
  • A hashtable with disk persistency at the granularity of update batch

If you use Krati as part of a topic map application please share your experience.

December 11, 2010

Project Voldemort

Filed under: NoSQL — Patrick Durusau @ 7:45 pm

Project Voldemort

From the website:

Voldemort is not a relational database, it does not attempt to satisfy arbitrary relations while satisfying ACID properties. Nor is it an object database that attempts to transparently map object reference graphs. Nor does it introduce a new abstraction such as document-orientation. It is basically just a big, distributed, persistent, fault-tolerant hash table.

Depending upon your requirements, this could be a useful component.

Sensei

Filed under: Indexing,Lucene,NoSQL — Patrick Durusau @ 3:35 pm

Sensei

From the website:

Sensei is a distributed database that is designed to handle the following type of query:


SELECT f1,f2…fn FROM members
WHERE c1 AND c2 AND c3.. GROUP BY fx,fy,fz…
ORDER BY fa,fb…
LIMIT offset,count

Relies on zoie and hence Lucene for indexing.

Another comparison for the development of TMQL, which of course will need to address semantic sameness.

December 9, 2010

Schema Design for Raik (Take 2)

Filed under: NoSQL,Riak,Schema — Patrick Durusau @ 5:48 pm

Schema Design for Riak (Take 2)

Useful exercise in schema design in a NoSQL context.

No great surprise that focus on data and application requirements are the keys (sorry) to a successful deployment.

Amazing how often that gets repeated, at least in presentations.

Equally amazing how often that gets ignored in implementations (at least to judge from how often it is repeated in presentations).

Still, we all need reminders so it is worth the time to review the slides.

« Newer PostsOlder Posts »

Powered by WordPress