Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 10, 2011

MongoDB Development Prize!

Filed under: MongoDB,NoSQL,Topic Maps — Patrick Durusau @ 11:39 am

MongoDB Development Prize!

From the website:

10gen is pleased to announce the first MongoDB developer blog contest of 2011. We’ll be announcing the contest categories at the beginning of each month, and the 10gen engineering team will pick the winner. We hope that these contests will be a way for developers to share their experiences with MongoDB and learn from one another. And we are giving away some pretty cool prizes 🙂 This month we’re teaming up with (mt) Media Temple, who is offering free hosting to the winner.

Check out the blog for the March contest.

Could be a good way to get some PR for topic maps, your project, not to mention yourself.

March 5, 2011

Cassandra Data Model – Semantic Impedance

Filed under: Cassandra,NoSQL — Patrick Durusau @ 3:13 pm

WTF is a SuperColumn? An Intro to the Cassandra Data Model

A bit dated now but I thought some readers might find it useful.

From the posting:

If you’re coming from an RDBMS background (which is almost everyone) you’ll probably trip over some of the naming conventions while learning about Cassandra’s data model. It took me and my team members at Digg a couple days of talking things out before we “got it”. In recent weeks a bikeshed went down in the dev mailing list proposing a completely new naming scheme to alleviate some of the confusion. Throughout this discussion I kept thinking: “maybe if there were some decent examples out there people wouldn’t get so confused by the naming.” So, this is my stab at explaining Cassandra’s data model; It’s intended to help you get your feet wet & doesn’t go into every single detail but, hopefully, it helps clarify a few things.

Seems like I have heard about grouping sets of key/value pairs before but I will have to look for it. 😉

More seriously, the current wave of data sets only aggravates the known semantic impedance problem.

A wave of data sets that promises to only increase.

So semantic impedance is going to increase.

Semantic impedance can be:

  • ignored – most current stove-piped information systems
  • save-the-world semantic solutions – poor adoption rates
  • broken by self-interested mapping that is reusable – the topic maps solution

March 4, 2011

Berlin Buzzwords 2011

Filed under: Conferences,NoSQL,Topic Maps — Patrick Durusau @ 7:02 am

Berlin Buzzwords 2011

What a great name for a conference!

Extended Deadline: Sunday, March 6th at midnight MST

From the website:

Berlin Buzzwords 2011 is a conference for developers and users of open source software projects, focussing on the issues of scalable search, data-analysis in the cloud and NoSQL-databases. Berlin Buzzwords presents more than 30 talks and presentations of international speakers specific to the three tags “search”, “store” and “scale”.

Would be nice to have at least one or two topic map entries under search, if not one of the other terms.

OrientDB v0.9.25 & beyond!

Filed under: NoSQL,OrientDB — Patrick Durusau @ 5:53 am

OrientDB v0.9.25 has been released!

Features include:

  • Brand new memory model with level-1 and level-2 caches (Issue #242)
  • SQL prepared statement (Issue #49)
  • SQL Projections with the support of links (Issue #15)
  • Graphical editor for documents in OrientDB Studio app (Issue #217)
  • Graph representation in OrientDB Studio app
  • Support for JPA annotation by the Object Database interface (Issue #102)
  • Smart Console under bash: history, auto completition, etc. (Issue #228)
  • Operations to work with GEO-spatial points (Issue #182)
  • @rid support in SQL UPDATE statement (Issue #72)
  • Range queries against Indexes (Issue #231)
  • 100% support of TinkerPop Blueprints 0.5

Even more good news: 1.0RC1 is planned for April 2011.

March 3, 2011

MongoDB 1.8 Released!

Filed under: MongoDB,NoSQL — Patrick Durusau @ 10:12 am

MongoDB 1.8 Released

Release notes for MongoDB 1.8.

Incremental map/reduce supported to enable incremental updating of collections.

Reminds me to ask about incremental updating of topic maps.

Introduction to MongoDB – Post

Filed under: MongoDB,NoSQL — Patrick Durusau @ 10:06 am

Introduction to MongoDB

If you want a quick introduction to MongoDB with some lite examples, this post is for you.

Real-Time Log Processing System based on Flume and Cassandra – Post

Filed under: Cassandra,Flume,NoSQL — Patrick Durusau @ 10:01 am

Real-Time Log Processing System based on Flume and Cassandra

Very cool!

What would be even cooler, would be to have real-time associations with subjects that have information from outside the data set.

Or better yet, real-time on-demand associations with subjects that have information from outside the data set.

I suppose the classic use case would be running stats on all the sports events on a Saturday or Sunday, including individuals stats and merging in the latest doping, paternity and similar tests.

Other applications?

March 2, 2011

ektorp – Java API for CouchDB

Filed under: CouchDB,NoSQL — Patrick Durusau @ 7:05 am

ektorp – Java API for CouchDB

From the website:

Ektorp is a persistence API that uses CouchDB as storage engine. The goal of Ektorp is to combine JPA like functionality with the simplicity and flexibility that CouchDB provides.

Features

Here are some good reasons why you should consider to use Ektorp in your project:

  • Rich domain models. With powerful JSON-object mapping provided by Jackson it is easy to create rich domain models.
  • Schemaless comfort. As CouchDB is schemaless, the database gets out of the way during application development. With a schemaless database, most adjustments to the database become transparent and automatic.
  • Out-of-the-Box CRUD. The generic repository support makes it trivial to create persistence classes.
  • Simple and fluent API.
  • Spring Support. Ektorp features an optional spring support module.
  • Active development. Ektorp is actively developed and has a growing community.
  • Choice of abstraction level. From full object-document mapping to raw streams, Ektorp will never stop you if you need to step down an abstraction level.

I am going to be looking at this project more closely but it would be interesting to see a project that said:

Here are some reasons to not use this project and/or things it doesn’t do well…

I can’t recall ever seeing a project that had such a disclaimer.

Not that it would have to be long or detailed, but showing an awareness that whatever the project, it isn’t the universal solution would be nice.

March 1, 2011

NoSQL Databases: Why, what and when

NoSQL Databases: Why, what and when by Lorenzo Alberton.

When I posted RDBMS in the Social Networks Age I did not anticipate returning the very next day with another slide deck from Lorenzo. But, after viewing this slide deck, I just had to post it.

It is a very good overview of NoSQL databases and their underlying principles, with useful graphics as well (as opposed to the other kind).

I am going to have to study his graphic technique in hopes of applying it to the semantic issues that are at the core of topic maps.

February 28, 2011

Why Topic Maps? (or schema version n.0)

Filed under: NoSQL,Topic Maps — Patrick Durusau @ 8:59 am

A friend of mine forwarded one of the nay-sayer screeds bashing NoSQL implementations in favor of one of the current SQL offerings.

Why that sort of thing is popular remains a mystery to me. I freely grant that some of the NoSQL efforts may be unsuccessful but the effort overall is an interesting one.

And not unlike topic maps when you think about it.

In order to do normalization for a relational database, you have to both know all the subjects you are going to talk about in the database in advance and, more importantly, know how you are going to identify them.

(Yes, there are other, deeper semantic issues with relational databases but this post is for newbies so I won’t cover those here.)

So, what happens if you don’t know all the subjects in advance, how you are going to identify them or even what you want to say about them?

Well, with a relational database, I suppose that is what you can version 2.0 of your database schema, to which you migrate all your data.

And the same is true for relationships between subjects in your database.

Should you decide to add tables for those relationships, well, now you are at version 3.0 of your database schema.

Database versioning or “evolution” I think it is sometimes called, is an entire area of research and software in the database world. I really need to pull some of that together for a piece on how topic maps can help with the documentation aspects of that process.

I started to say that illustrates an advantage of topic maps over relational databases, that the schema does not have to be altered to add new relationships.

And from a certain point of view, it certainly is an advantage.

But, assume we do add a relationship type to a topic map, how do we then version the topic map?

Should we create topics that exist in associations with other topics to add versioning information as part of associations?

Or are there other mechanisms we should consider?

Sorry, did not mean to get side tracked into versioning but it is something that production quality topic maps should take into account.

I don’t think the choices are nearly as stark as SQL vs. NoSQL vs. Topic Maps vs. whatever.

Most information systems have needs that can be meet only with a healthy mixture of solutions.

People who advocate “myStack” solutions are selling just that “myStack” solutions.

As a user/consumer, I prefer “mySolution” stacks. Not exactly the same thing.

February 24, 2011

Cassandra’s data model as records and lists – Post

Filed under: Cassandra,NoSQL — Patrick Durusau @ 3:23 pm

Cassandra’s data model as records and lists

From the post:

I have to admit I’ve never really been happy with Cassandra’s data model, or to be more precisely, I’ve never really been with my understanding of the model. However I’ve realized that if we think of two use cases for column families then things may become a bit clearer. For me, Column families can be used in one of two ways, either as a record or an ordered list.

I thought it was helpful, what do you think?

February 22, 2011

elasticsearch

Filed under: Lucene,NoSQL,Search Engines — Patrick Durusau @ 1:18 pm

elasticsearch

From the website:

So, we build a web site or an application and want to add search to it, and then it hits us: getting search working is hard. We want our search solution to be fast, we want a painless setup and a completely free search schema, we want to be able to index data simply using JSON over HTTP, we want our search server to be always available, we want to be able to start with one machine and scale to hundreds, we want real-time search, we want simple multi-tenancy, and we want a solution that is built for the cloud.

This should be easier“, we declared, “and cool, bonsai cool“.

elasticsearch aims to solve all these problems and more. It is an Open Source (Apache 2), Distributed, RESTful, Search Engine built on top of Lucene.

Another contender in the space for search engines.

Do you have a favorite search engine? If so, what about it makes it your favorite?

February 20, 2011

On Distributed Consistency — Part 1 (MongoDB)

Filed under: Consistency,Distributed Consistency,MongoDB,NoSQL — Patrick Durusau @ 1:04 pm

On Distributed Consistency — Part 1 (MongoDB)

The first of a six part series on consistency in distributed databases.

From the website:

See also:

  • Part 2 – Eventual Consistency
  • Part 3 – Network Partitions
  • Part 4 – Multi Data Center
  • Part 5 – Multi Writer Eventual Consistency
  • Part 6 – Consistency Chart

For distributed databases, consistency models are a topic of huge importance. We’d like to delve a bit deeper on this topic with a series of articles, discussing subjects such as what model is right for a particular use case. Please jump in and help us in the comments.

Consistency is an issue that will confront distributed topic maps so best to start learning the options now.

Riak Search

Filed under: NoSQL,Riak — Patrick Durusau @ 11:05 am

Riak Search

From the website:

Riak Search is a distributed, easily-scalable, failure-tolerant, real-time, full-text search engine built around Riak Core and tightly integrated with Riak KV.

Riak Search allows you to find and retrieve your Riak objects using the objects’ values. When a Riak KV bucket has been enabled for Search integration (by installing the Search pre-commit hook), any objects stored in that bucket are also indexed seamlessly in Riak Search.

The Riak Client API can then be used to perform Search queries that return a list of bucket/key pairs matching the query. Alternatively, the query results can be used as the input to a Riak map/reduce operation. Currently the PHP, Python, Ruby, and Erlang APIs support integration with Riak Search.

The indexing of XML data (it takes path/element name as key) is plausible enough. Made me wonder about a slightly different operation.

What if as part of the indexing operation, additional properties were added to the key?

Could be as simple as the DTD/Schema that defines the element or more complex information about the field.

HSearch NoSQL Search Engine Built on HBase

Filed under: HBase,HSearch,NoSQL — Patrick Durusau @ 10:51 am

HSearch NoSQL Search Engine Built on HBase

HSearch features include:

  • Multi-XML formats
  • Record and document level search access control
  • Continuous index updation
  • Parallel indexing using multi-machines
  • Embeddable inside application
  • A REST-ful Web service gateway that supports XML
  • Auto sharding
  • Auto replication

Original title and link: HSearch: NoSQL Search Engine Built on HBase (NoSQL databases © myNoSQL)

Another entry in the NoSQL arena.

I don’t recall but was parallel querying discussed for TMQL?

February 18, 2011

The Next Generation of Apache Hadoop MapReduce

Filed under: Algorithms,Hadoop,MapReduce,NoSQL,Topic Maps — Patrick Durusau @ 5:02 am

The Next Generation of Apache Hadoop MapReduce by Arun C Murthy (@acmurthy)

From the webpage:

In the Big Data business running fewer larger clusters is cheaper than running more small clusters. Larger clusters also process larger data sets and support more jobs and users.

The Apache Hadoop MapReduce framework has hit a scalability limit around 4,000 machines. We are developing the next generation of Apache Hadoop MapReduce that factors the framework into a generic resource scheduler and a per-job, user-defined component that manages the application execution. Since downtime is more expensive at scale high-availability is built-in from the beginning; as are security and multi-tenancy to support many users on the larger clusters. The new architecture will also increase innovation, agility and hardware utilization.

Since I posted the note about OpenStack and it is Friday, it seemed like a natural. Something to read over the weekend!

Saw this first at Alex Popescu’s myNoSQL – The Next Generation of Apache Hadoop MapReduce, which is sporting a new look!

February 16, 2011

PYGR: Python Graph Database for Bioinformatics

Filed under: Bioinformatics,Graphs,NoSQL — Patrick Durusau @ 12:57 pm

PYGR: Python Graph Database for Bioinformatics

From the website:

Pygr is open source software designed to make it easy to do powerful sequence and comparative genomics analyses, even with extremely large multi-genome alignments.

  • Bioinformatics tools for sequence analysis and comparative genomics such as sequence databases, search methods such as BLAST, repeat-masking, megablast, etc., sequence annotation databases and annotation query, and sequence alignment datasets.
  • Data namespace for accessing a given resource with seamless data relationship management. Easy data sharing that includes transparent access over network protocols.
  • High performance graph representation of interval data

If anyone has any spare grant money must lying around for graduate student work, it would be real interesting to see an inter-disciplinary history on graph databases.

We may not be able to avoid re-inventing the wheel but perhaps we could re-invent stronger wheels more quickly.

Announcing Neo4j on Windows Azure – Post

Filed under: Graphs,Neo4j,NoSQL — Patrick Durusau @ 12:54 pm

Announcing Neo4j on Windows Azure

Peter Neubauer and Magnus Mårtensson write:

Neo4j has a ‘j’ appended to the name. And now it is available on Windows Azure? This proves that in the most unlikely of circumstances sometimes beautiful things can emerge. Microsoft has promised Java to be a valued “first class citizen” on Windows Azure. In this blog post we will show that it is no problem at all to host a sophisticated and complex server product such as the Neo4j graph database server on Window Azure. Since Neo4j has a REST API over HTTP you can speak to this server from your regular .NET (or Java) applications, inside or outside of the cloud just as easily as you speak to Windows Azure Storage.

Would be interesting if the cloud proves to be the impetus for the next step in interoperability.

The more opportunity for interoperability, the greater I see the need for topic maps to govern semantic interoperability.

So long as data interchange/interoperability is largely theoretical, there isn’t much sense in being concerned.

When your nearest competitor is gaining ground or pulling away because of data interchange/interoperability, it is an entirely different matter.

February 15, 2011

Webmail for Millions, Powered by Erlang

Filed under: Erlang,Hibari,NoSQL — Patrick Durusau @ 11:38 am

Webmail for Millions, Powered by Erlang

From the website:

Scott Lystig Fritchie presents the architecture and lessons learned implementing a webmail system in Erlang, using UBF and Hibari, a distributed key-value store, to accommodate a large user base.

UBF? (new to me)

From http://norton.github.com/ubf:

UBF is the “Universal Binary Format”, designed and implemented by Joe Armstrong. UBF is a language for transporting and describing complex data structures across a network. It has three components:

  • UBF(A) is a “language neutral” data transport format, roughly equivalent to well-formed XML.
  • UBF(B) is a programming language for describing types in UBF(A) and protocols between clients and servers. This layer is typically called the “protocol contract”. UBF(B) is roughly equivalent to Verified XML, XML-schemas, SOAP and WDSL.
  • UBF(C) is a meta-level protocol used between a UBF client and a UBF server.

Potential lessons for those developing scalable topic map applications.

WinCouch

Filed under: CouchDB,NoSQL — Patrick Durusau @ 11:25 am

WinCouch

From the website:

The one-click CouchDB package for Windows like the Jan’s CouchDBX for Mac OSX.

  • Based on CouchDB-1.0.2 binaries from Dave Cottlehuber.
  • Used the GeckoFX to embed Mozilla Gecko (Firefox) into the application.

A Couch implementation for Windows.

I tried to access the www.geckofx.org website on several days but was unable to connect. I was able to connect to the http://code.google.com/p/geckofx/. It points to www.geckofx.org. Thinking this could be of interest to topic map application developers.

If someone knows the status of the www.geckofx.org site, please post a note here. Thanks!

Cassandra 0.7.1 Release

Filed under: Cassandra,NoSQL — Patrick Durusau @ 11:06 am

Cassandra 0.7.1

Largest production cluster reported to be 100 TB spread over 150 machines.

It occurs to me that most topic map engines support SQL backends.

I will be checking in on the SQL world for recent developments that are relevant to topic maps.

February 13, 2011

Elliptics in production

Filed under: Elliptics,NoSQL — Patrick Durusau @ 1:51 pm

Elliptics in production

From the project website:

Elliptics network is a fault tolerant distributed hash table object storage.

The network does not use dedicated servers to maintain the metadata information, it supports redundant objects storage and implements transactional data update. Small to medium sized write benchmarks can be found (its the latest to date, other presented earlier) in the appropriate blog section.

Distributed hash table design allows not to use dedicated metadata servers which frequently become points of failure in the classical storages, instead user can connect to any server in the network and all requests will be forwarded to the needed nodes, one can also lookup the needed server and connect there directly. It can really be called a cloud of losely connected equivalent nodes. Joining node will automatically connect to the needed servers according to the network topology, it can store data in different configurable backends like file IO storage, eblob backend or using own IO storage backend.

Protocol allows to implement own data storage using specific features for the deploying project and generally extend data communication with infinite number of the extensions. One of the implemented examples is remote command execution, which can be used as a load balancing job manager.

Hard to say which of the NoSQL solutions will make useful backends or other components in a topic map system.

But, I would rather err of being inclusive.

O’Reilly Book Sale

Filed under: Books,MongoDB,NoSQL,Topic Maps — Patrick Durusau @ 7:14 am

OK, ok, one more not-strictly topic map item and I promise no more for today!

Buy 2, Get 1 Free at O’Reilly caught my eye this morning.

I think it is justified to appear here for two reasons:

1) It has a lot of books, such as those on databases, that are relevant to implementing topic map systems.

But, just as importantly:

2) The O’Reilly online catalog illustrates the need for topic maps.

Look at the catalog listings under Other Databases for MongoDB (you may have heard about it). Now look under Database Design and Analysis. Opps! There you will find: MongoDB: The Definitive Guide. (at least as of 13 February 2010, 7:01 AM Eastern time)

One way (not the only way) to implement a topic map here would result in a single source of updates across the catalog. And the catalog could also act as a resource pointer to other materials. The Other Resources for the MongoDB book, isn’t terribly inspiring.

*****
PS: I am hopeful the interest in NoSQL will drive greater exploration of MySQL, PostgresSQL and Oracle databases in general and as part of topic maps systems in particular.

February 12, 2011

BigCouch

Filed under: BigCouch,Clustering (servers),CouchDB,NoSQL — Patrick Durusau @ 5:25 pm

BigCouch 0.3 release.

From the website:

BigCouch is our open-source flavor of CouchDB with built-in clustering capability.

The main difference between BigCouch and standalone Couch is the inclusion of an OTP application that ‘clusters’ CouchDB across multiple servers.

For now, BigCouch is a stand-alone fork of CouchDB. In the future, we believe (and hope!) that many of the upgrades we’ve made will be incorporated back into CouchDB proper.

Many worthwhile topic map applications can be written without clustering, but “clustering” is one of those buzz words to include your response to an RFP, grant proposal, etc.

Good to have some background on what clustering means/requires in general and beating on several of the clustering solutions will develop that background.

Not to mention that you will know when it makes sense to actually implement clustering.

February 8, 2011

Redis Tutorial – April 2010

Filed under: NoSQL,Redis — Patrick Durusau @ 5:52 am

Redis Tutorial – April 2010

If you are just getting started with Redis or simply want to explore it a bit, Simon Willison’s tutorial is a good place to start.

Try Redis – Try Topic Maps?

Filed under: Examples,Marketing,NoSQL,Redis — Patrick Durusau @ 5:36 am

Try Redis is a clever introduction to Redis.

I recommend it to you as an introduction to Redis and NoSQL in general.

It also makes me wonder if it would be possible to create a similar resource for topic maps?

Granting that it would have to make prior choices about subject matter, data sets, etc. but still, it could be an effective marketing tool for topic maps.

I suspect so even if the range of choices to be made to effect merging were limited.

If I were a left-wing one political blogger in the US I would create a topic map that includes donations to Republican PACs and white collar crime convictions by family members.

Or for the right-wing, a mapping between the provisions of ObamaCare and various specific groups and agencies.

Such that users could choose additional information and it shows up in some visually pleasing way to make the case that the user already thinks is true.

Will have to give this some thought in terms of a framework with a smallish number of framework topics and the ability to quickly add in additional topics for a particular demonstration.

Such that it would be possible to quickly create a topic map demo for some breaking news story.

Could provide useful additional content but the main purpose being a demonstration of the technology.

Useful content is fairly rare so no need to tax a topic map demo with always providing useful content. Sometimes, content is simply content. 😉

February 2, 2011

Interview with Salvatore Sanfilippo on Redis – Podcast

Filed under: NoSQL,Redis — Patrick Durusau @ 9:14 am

Redis with Salvatore Sanfilippo Podcast from MyNoSQL by Alex Popescu.

Sanfilippo says that Redis is a key/value database to be sure but from another point of view, it is also specific values that have data models. (?A fifteen minutes introduction to Redis data types)

See the Redis homepage

Which reminds me of a post on the nature of keys that I have been meaning to finish. More on that topic soon.

February 1, 2011

A Short eBook on Scaling MongoDB

Filed under: MongoDB,NoSQL — Patrick Durusau @ 7:52 am

A Short eBook on Scaling MongoDB

Kristina Chodorow’s blog, Snail in a Turtleneck announced a short eBook by Kristina.

I haven’t read it, yet, but am sure to be doing so in the near future.

Comments on the same are welcome!

January 31, 2011

Introduction to Riak Video with Rusty Klophaus – Post

Filed under: NoSQL,Riak — Patrick Durusau @ 1:58 pm

Introduction to Riak Video with Rusty Klophaus from MyNoSQL by Alex Popescu. Viewable online or downloadable in a couple of formats.

Starts with the observation that there are 47 different NoSQL projects. Doesn’t list them. 😉

I would watch this at the PivotLabs link because the related talks.

Oh, Riak homepage.

While I like the video, it is also an example that you don’t need high end video production or editing to produce useful video of presentations.

I mention as an answer to conferences that protest they need expensive equipment to video presentations.

That is simply not the case and anyone who says otherwise, to be generous, is mis-informed.

MongoVUE

Filed under: MongoDB,NoSQL — Patrick Durusau @ 1:49 pm

MongoVUE

From the website:

What Is MongoVUE?

  • MongoVUE is a GUI (graphical user interface) application that helps you administer, develop and learn MongoDB.
  • MongoVUE is FREE to use.
  • To run properly, MongoVUE requires Microsoft .NET Framework 2.0 SP1 installed on your computer.

Tools for working with NoSQL databases are starting to appear.

Any thoughts on this one that you would like to share?

« Newer PostsOlder Posts »

Powered by WordPress