Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

April 25, 2012

Faster Apache CouchDB

Filed under: CouchDB,NoSQL — Patrick Durusau @ 6:26 pm

Faster Apache CouchDB.

Kay Ewbak reports:

Apache has announced the release of CouchDB 1.2.0. It brings lots of improvements, some of which mean apps written for older versions of CouchDB will no longer work.

According to the blog post from its developers, the changes start with improved performance and security. The performance is better because the developers have added a native JSON parser where the performance critical portions are implemented in C, so latency and throughput for all database and view operations is improved. JSON (JavaScript Object Notation) is a lightweight data-interchange format that is easy for humans to read and write and for machines to parse and generate. The CouchDB team is using the yajl library for its JSON parser.

The new version of CouchDB also has optional file compression for database and view index files, with all storage operations being passed through Google’s snappy compressor. This means less data has to be transferred, so access is faster.

Alongside these headline changes for performance, the team has also made other changes that take the Erlang runtime system into account to improve concurrency when writing data to databases and view index files.

Grab a copy here, or see Kay’s post for more details.

April 21, 2012

On multi-form data

Filed under: MongoDB,NoSQL — Patrick Durusau @ 4:35 pm

On multi-form data

From the post:

I read an excellent debrief on a startup’s experience with MongoDB, called “A Year with MongoDB”.

It was excellent due to its level of detail. Some of its points are important — particularly global write lock and uncompressed field names, both issues that needlessly afflict large MongoDB clusters and will likely be fixed eventually.

However, it’s also pretty clear from this post that they were not using MongoDB in the best way.

An interesting take on when and just as importantly, when not to use MongoDB.

As NoSQL offerings mature, are we doing to see more of this sort of treatment or will more treatments like this drive the maturity of NoSQL offerings?

Pointers to “more like this?” (not just on MongoDB but other NoSQL offerings as well)

Neo4j 1.7 GA “Bastuträsk Bänk” released

Filed under: Neo4j,NoSQL — Patrick Durusau @ 4:35 pm

Neo4j 1.7 GA “Bastuträsk Bänk” released

We’re very pleased to announce that Neo4j 1.7 GA, codenamed “Bastuträsk Bänk” is now generally available. The many improvements ushered in through milestones have been properly QA’d and documented, making 1.7 the preferred stable version for all production deployments. Let’s review the highlights.

The release includes a number of features but I was surprised by:

With 1.7, Cypher now has a full range of common math functions for use in the RETURN and WHERE clause.

Because the “full range of common math functions” turned out to be ABS, ROUND, SQRT, and SIGN. That doesn’t look like a “full range of common math functions” to me. How about you?

Math operators are documented at: Operators

April 19, 2012

SciDB Version 12.3

Filed under: NoSQL,SciDB — Patrick Durusau @ 7:18 pm

SciDB Version 12.3

From the email notice:

Highlights of this release include:

  • more compact storage
  • vectorized expression evaluation
  • improvements to grand, grouped and window aggregates
  • support for non-integer dimensions within most major operators, including joins
  • transactional storage engine with error detection and rollback

Internal benchmarks comparing this release with the prior releases show disk usage reduced by 25%-50% and queries that use vectorized expression evaluation sped up by 4-10X.

Hyperdex: Documentation

Filed under: HyperDex,NoSQL — Patrick Durusau @ 7:18 pm

Posting on the Hyperdex documentation separately from its latest release. It may be of more lasting interest.

Current Documentation:

Hyperdex Documentation (web based)

Hyperdex Documentation (PDF)

Mailing lists:

hyperdex-announce

hyperdex-discuss

Other:

Hyperdex: A New Era in High Performance Data Stores for the Cloud (presentation, 13 April 2012)

Hyperdex Tutorial

Hyperdex (homepage)

AvocadoDB Query Language

Filed under: AvocadoDB,NoSQL — Patrick Durusau @ 6:52 pm

AvocadoDB Query Language

This just in, resources on the proposed AvocadoDB query language.

There are slides, a presentation, a “visualization” (railroad diagram).

Apparently not set in stone (yet) so take the time to review and make comments.

BTW, blog comments are a good idea but a mailing list might be better?

April 18, 2012

The Little MongoDB Book

Filed under: MongoDB,NoSQL — Patrick Durusau @ 6:08 pm

The Little MongoDB Book

Karl Seguin has written a short (thirty-two pages) guide to MongoDB.

It won’t make you a hairy-chested terror at big data conferences but it will get you started with MongoDB.

I would bookmark http://mongly.com/, also by Karl, to consult along with the Little MongoDB book.

Finally, as you learn MongoDB, contribute to these and other resources with examples, tutorials, data sets.

Particularly tutorials on analysis of data sets. It is one thing to know schema X works in general with data sets of type Y. It is quite another to understand why.

April 17, 2012

Accumulo

Filed under: Accumulo,NoSQL — Patrick Durusau @ 7:12 pm

Accumulo

From the webpage:

The Apache Accumulo™ sorted, distributed key/value store is a robust, scalable, high performance data storage and retrieval system. Apache Accumulo is based on Google’s BigTable design and is built on top of Apache Hadoop, Zookeeper, and Thrift. Apache Accumulo features a few novel improvements on the BigTable design in the form of cell-based access control and a server-side programming mechanism that can modify key/value pairs at various points in the data management process. Other notable improvements and feature are outlined here.

We mentioned Accumulo here but missed its graduation from the incubator. Apologies.

HBaseCon 2012: A Glimpse into the Development Track

Filed under: Conferences,HBase,NoSQL — Patrick Durusau @ 7:11 pm

HBaseCon 2012: A Glimpse into the Development Track by Jon Zuanich.

Jon posted a reminder about the development track at HBaseCon 2012:

  • Learning HBase Internals – Lars Hofhansl, Salesforce.com
  • Lessons learned from OpenTSDB – Benoit Sigoure, StumbleUpon
  • HBase Schema Design – Ian Varley, Salesforce.com
  • HBase and HDFS: Past, Present, and Future – Todd Lipcon, Cloudera
  • Lightning Talk | Relaxed Transactions for HBase – Francis Liu, Yahoo!
  • Lightning Talk | Living Data: Applying Adaptable Schemas to HBase – Aaron Kimball, WibiData

Non-developers can check out the rest of the Agenda. 😉

Conference: May 22, 2012 InterContinental San Francisco Hotel.

April 15, 2012

TIBCO ActiveSpaces – Community Edition Soon (2.0.1)

Filed under: ActiveSpaces,NoSQL — Patrick Durusau @ 7:13 pm

TIBCO ActiveSpaces

From the webpage:

There is increasing pressure on IT to reduce reliance on costly transactional systems and to process increasing streams of data and events in real time.

TIBCO ActiveSpaces® Enterprise Edition provides an infrastructure for building highly scalable, fault-tolerant distributed applications. It combines the features and performance of databases, caching systems, and messaging software to support very large, highly volatile data sets and event-driven applications. It enables organizations to off-load transaction-heavy systems and allows developers to concentrate on business logic rather than the complexities of distributing, scaling, and making applications autonomously fault-tolerant.

TIBCO ActiveSpaces Enterprise Edition is a distributed peer-to-peer in-memory data grid, a form of virtual shared memory that leverages a distributed hash table with configurable replication. This approach means the capacity of the space scales automatically as nodes join and leave. Replication assures fault-tolerance from node failure as the space autonomously re-replicates and re-distributes lost data.

I saw this at KDnuggets and had to investigate.

While poking about the site I found: Coming soon: ActiveSpaces Community edition! by Jean-Noel Moyne, which says:

I am proud to be able to announce that along with the upcomming ActiveSpaces Enterprise Edition version 2.0.1 we will also be releasing a new ‘Community Edition’ of ActiveSpaces 2.0.1.

The community edition will be available for download free of charge, giving every one a chance to evaluate ActiveSpaces for themselves.

The community edition is the full featured version of ActiveSpaces and is only limited in the fact that you can not use it in production, as it is only supported through the community of users and not by TIBCO Software and also by the fact that you can only have a maximum of four members to each metaspace that your process connects to.

Stay tuned for more details about this comming soon!

Looking forward to learning more about ActiveSpaces!

April 14, 2012

VoltDB Version 2.5

Filed under: NoSQL,VoltDB — Patrick Durusau @ 6:28 pm

VoltDB Version 2.5

VoltDB 2.5 has arrived with:

Database Replication. As I’d previously described here, Database Replication is the headline feature of 2.5 (until recently, we referred to the feature as WAN replication). It allows VoltDB databases to be automatically replicated within and across data centers. Available in the VoltDB Enterprise Edition, Database Replication ensures that every database transaction applied to a VoltDB database is asynchronously applied to a defined replica database. Following a catastrophic crash, you can immediately promote the database replica to be the master and redirect all traffic to that cluster. Once the original master has been recovered, you can quickly and easily reverse the process.

In addition to serving disaster recovery needs, you can also use Database Replication to maintain a hot standby database (i.e., to eliminate service windows when you’re doing systems maintenance) and for workload optimization where, for example, write traffic is directed to the master VoltDB database, and read traffic is directed to the replica.

Performance improvements. Version 2.5 includes performance improvements to the VoltDB SQL planner, which benefit all VoltDB products. In addition, we eliminated some unnecessary cluster messaging for single-node deployments, which reduce average transaction latencies to around 1ms for our VoltOne product.

Functional enhancements. In 2.5 we expanded VoltDB’s SQL support and extended support for distributed joins. We also added new administrative options for managing database snapshots and controlling the behavior of command logging activities.

Updated Node.js support. As Andy Wilson describes here, VoltDB 2.5 includes an updated client library for the Node.js programming framework. This driver, which was originally created by community member Jacob Wright, includes performance optimizations, bug fixes and modifications that align the driver with Node.js coding standards.

It may already exist (pointer please!) but with new versions of databases, when not entirely new databases, appearing on a regular basis, a common test suite of data would be a good thing to have. Nothing heavy, say 50 GB uncompressed of CSV files with varying structures.

Thoughts?

April 6, 2012

MongoDB Architecture

Filed under: MongoDB,NoSQL — Patrick Durusau @ 6:51 pm

MongoDB Architecture by Ricky Ho.

From the post:

NOSQL has become a very heated topic for large web-scale deployment where scalability and semi-structured data driven the DB requirement towards NOSQL. There has been many NOSQL products evolving in over last couple years. In my past blogs, I have been covering the underlying distributed system theory of NOSQL, as well as some specific products such as CouchDB and Cassandra/HBase.

Last Friday I was very lucky to meet with Jared Rosoff from 10gen in a technical conference and have a discussion about the technical architecture of MongoDb. I found the information is very useful and want to share with more people.

One thing I am very impressed by MongoDb is that it is extremely easy to use and the underlying architecture is also very easy to understand.

Very nice walk through the architecture of MongoDB! Certainly a model for posts exploring other NoSQL solutions.

Cassandra Europe 2012 (Slides)

Filed under: Cassandra,Conferences,NoSQL — Patrick Durusau @ 6:45 pm

Cassandra Europe 2012 (Slides)

Slides are up from Cassandra Europe, 28 March 2012.

From the program:

  • Andrew Byde – Acunu Analytics: Simple, Powerful, Real-time
  • Gary Dusbabek – Cassandra at Rackspace: Cloud Monitoring
  • Eric Evans – CQL: Then, Now, and When
  • Nicolas Favre-Felix – Cassandra Storage Internals
  • Dave Gardner – Introduction to NoSQL and Cassandra
  • Jeremy Hanna – Powering Social Business Intelligence: Cassandra and Hadoop at the Dachis Group
  • Sylvain Lebresne – On Cassandra Development: Past, Present and Future
  • Richard Low – Data Modelling Workshop
  • Richard Lowe – Cassandra at Arkivum
  • Sam Overton – Highly Available: The Cassandra Distribution Model
  • Noa Resare – Cassandra at Spotify
  • Denis Sheahan – Netflix’s Cassandra Architecture and Open Source Efforts
  • Tom Wilkie – Next Generation Cassandra

March 29, 2012

AvocadoDB

Filed under: AvocadoDB,Graphs,NoSQL — Patrick Durusau @ 6:41 pm

AvocadoDB

From the webpage:

We recently started a new open source project – a nosql database called AvocadoDB.

Key features include:

  • Schema-free schemata let you combine the space efficiency of MySQL with the performance power of NoSQL
  • Use AvocadoDB as an application server and fuse your application and database together for maximal throughput
  • JavaScript for all: no language zoo, use just one language from your browser to your back-end
  • AvocadoDB is multi-threaded – exploit the power of all your cores
  • Flexible data modeling: model your data as combination of key-value pairs, documents or graphs – perfect for social relations
  • Free index choice: use the correct index for your problem, be it a skip list or a n-gram search
  • Configurable durability: let the application decide if it needs more durability or more performance
  • No-nonsense storage: AvocadoDB uses of all the power of modern storage hardware, like SSD and large caches
  • It is open source (Apache Licence 2.0)

The presentation you will find at the homepage says you can view your data as a graph. Apparently edges can have multiple properties. Looks worth further investigation.

March 22, 2012

Sehrch.com: … Powered by Hypertable (Semantic Web/RDF Performance by Design)

Filed under: Hypertable,NoSQL,Sehrch.com — Patrick Durusau @ 7:41 pm

Sehrch.com: A Structured Search Engine Powered By Hypertable

From the introduction:

Sehrch.com is a structured search engine. It provides powerful querying capabilities that enable users to quickly complete complex information retrieval tasks. It gathers conceptual awareness from the Linked Open Data cloud, and can be used as (1) a regular search engine or (2) as a structured search engine. In both cases conceptual awareness is used to build entity centric result sets.

To facilitate structured search we have introduced a new simple search query syntax that allows for bound properties and relations (contains, less than, more than, between, etc). The initial implementation of Sehrch.com was built over an eight month period. The primary technologies used are Hypertable and Lucene. Hypertable is used to store all Sehrch.com data which is in RDF (Resource Description Framework). Lucene provides the underlying searcher capability. This post provides an introduction to structured web search and an overview of how we tackled our big data problem.

A bit later you read:

We achieved a stable loading throughput of 22,000 triples per second (tps), peaking at 30,000 tps. Within 24 hours we had loaded the full 1.3 billion triples on a single node, on hardware that was at least two years out of date. We were shocked, mostly because on the same hardware the SPARQL compliant triplestores had managed 80 million triples (Virtuoso) at the most and Hypertable had just loaded all 1.3 billion. The 500GB of input RDF had become a very efficient 50GB of Hypertable data files. But then, loading was only half of the problem, could we query? We wrote a multi-threaded data exporter that would query Hypertable for entities by subject (Hypertable row key) randomly. We ran the exporter, and achieved speeds that peaked at 1,800 queries per second. Again we were shocked. Now that the data the challenge had set forth was loaded, we wondered how far Hypertable could go on the same hardware.

So we reloaded the data, this time appending the row keys with 1. Hypertable completed the load again, in approximately the same time. So we ran it again, now appending the keys with 2. Hypertable completed again, again in the same time frame. We now had a machine which was only 5% of our eventual production specification that stored 3.6 billion triples, three copies each of DBpedia and Freebase. We reran our data exporter and achieved query speeds that ranged between 1,000-1,300 queries per second. From that day on we have never looked back, Hypertable solved our data storage problem, it smashed the challenge that we set forth that would determine if Sehrch.com was at all possible. Hypertable made it possible.

That’s performance by design, not brute force.

On the other hand, the results of: Pop singers less than 20 years old, could be improved. Page after page of Miley Cyrus results gets old in a hurry. 😉

I am sure the team at Sehrch.com would appreciate your suggestions and comments.

March 7, 2012

NoSQL Matters 2012 – Speakers

Filed under: Conferences,NoSQL — Patrick Durusau @ 5:43 pm

NoSQL Matters 2012 – Speakers

NoSQL Matters – Cologne, Germany – May 29-30, 2012.

Rather than run the risk of playing favorites, I listed all the speakers for the conference. Even only one or two of them would be worth attending the conference. To have all of them together, this is a must attend type conference!

From the webpage:

Key-Note

  • Luca Garulli – From Values to Documents, from Relations to Graphs – A Survey and Guide through the unexhausted areas of NoSQL
  • Doug Judd – Scaling in a Non-Relational World

Overview

  • Dirk Bartels – NoSQL. A Technology for Real Time Enterprise Applications?
  • Pavlo Baron – DistributedDB (Playfully Illustrated)
  • Peter Idestam-Almquist – NewSQL Database for New Real-Time Applications
  • Tim Lossen – From MySQL to NoSQL to „Nothing“
  • Daniel McGrath – Rocket U2 Databases & The MultiValue Model
  • Martin Scholl – NoSQL: Back to the Future or It Is Simply Yet Another Database Feature?

Specific Databases

  • Jonathan Ellis – Apache Cassandra: Real-World Scalability, Today
  • Muharem Hrnjadovic – MongoDB Sharding
  • Doug Judd – Hypertable
  • Jan Lehnardt – The No-Marketing Bullshit Introduction to Couchbase Server 2.0
  • Mathias Meyer – RIAK
  • Salvatore Sanfillipo – Redis
  • Martin Schönert – AvocadoDB

Graph

  • Luca Garulli – Design your Application Using Persistent Graphs and OrientDB
  • Peter Neubauer – Neo4J, Gremlin, Cypher: Graph Processing for All
  • Pere Urbon-Bayes – From Tables to Graph. Recommendation Systems, a Graph Database Use Case Analysis

Application

  • Timo Derstappen – NoSQL: Not Only a Fairy Tale
  • Chris Harris – Building Hybrid Applications with MongoDB, RDBMS & Hadoop
  • Alex Morgner – structr – A CMS Implementation Based On a Graph Database

Other

  • Olaf Bachman – NoNoSQL@Google
  • Matt Casters – Crazy NoSQL Data Integration with Pentaho
  • Vincent Delfosse – UML As a Schema Candidate for NoSql
  • Oliver Gierke – Data Access 2.0? Please Welcome: Spring Data!
  • Alexandre Morgaut – Wakanda: NoSQL for Model-Driven Web Applications
  • Bernd Ocklin – MySQL Cluster: The Realtime Database You Haven’t Heard About

March 4, 2012

NoSQL Data Modeling Techniques

Filed under: Data Models,NoSQL — Patrick Durusau @ 7:17 pm

NoSQL Data Modeling Techniques by Ilya Katsov.

From the post:

NoSQL databases are often compared by various non-functional criteria, such as scalability, performance, and consistency. This aspect of NoSQL is well-studied both in practice and theory because specific non-functional properties are often the main justification for NoSQL usage and fundamental results on distributed systems like CAP theorem are well applicable to the NoSQL systems. At the same time, NoSQL data modeling is not so well studied and lacks of systematic theory like in relational databases. In this article I provide a short comparison of NoSQL system families from the data modeling point of view and digest several common modeling techniques.

To explore data modeling techniques, we have to start with some more or less systematic view of NoSQL data models that preferably reveals trends and interconnections. The following figure depicts imaginary “evolution” of the major NoSQL system families, namely, Key-Value stores, BigTable-style databases, Document databases, Full Text Search Engines, and Graph databases:

Very complete and readable coverage of NoSQL data modeling techniques!

A must read if you are interested in making good choices between NoSQL solutions.

This post could profitably turned into a book length treatment with longer and a greater variety of examples.

March 2, 2012

Breaking into the NoSQL Conversation

Filed under: NoSQL,RDF,Semantic Web — Patrick Durusau @ 8:05 pm

Breaking into the NoSQL Conversation by Rob Gonzalez.

Semantic Web Community: I’m disappointed in us! Or at least in our group marketing prowess. We have been failing to capitalize on two major trends that everyone has been talking about and that are directly addressable by Semantic Web technologies! For shame.

I’m talking of course about Big Data and NoSQL. Given that I’ve already given my take on how Semantic Web technology can help with the Big Data problem on SemanticWeb.com, this time around I’ll tackle NoSQL and the Semantic Web.

After all, we gave up SQL more than a decade ago. We should be part of the discussion. Heck, even the XQuery guys got in on the action early!

(much content omitted, read at your leisure)

AllegroGraph, Virtuoso, and Systap can all scale, and can all shard like Mongo. We have more mature, feature rich, and robust APIs via Sesame and others to interact with the data in these stores. So why aren’t we in the conversation? Is there something really obvious that I’m missing?

Let’s make it happen. For more than a decade our community has had a vision for how to build a better web. In the past, traditional tools and inertia have kept developers from trying new databases. Today, there are no rules. It’s high time we stepped it up. On the web we can compete with MongoDB directly on those use cases. In the enterprise we can combine the best of SQL and NoSQL for a new class of flexible, robust data management tools. The conversation should not continue to move so quickly without our voice.

I hate to disappoint but the reason the conversation is moving so quickly is the absence of the Semantic Web voice.

Consider my post earlier today about the new hardware/software release by Cray, A Computer More Powerful Than Watson. The release refers to RDF as a “graph format.”

With good reason. The uRIKA system doesn’t use RDF for reasoning at all. It materializes all the implied nodes and searches the materialized graph. Impressive numbers but reasoning it isn’t.

Inertia did not stop developers from trying new databases. New databases that met no viable (commercially that is) use cases went unused. What’s so hard to understand about that?

February 27, 2012

Cassandra Radical NoSQL Scalability

Filed under: Cassandra,CQL - Cassandra Query Language,NoSQL,Scalability — Patrick Durusau @ 8:25 pm

Cassandra Radical NoSQL Scalability by Tim Berglund.

From the description:

Cassandra is a scalable, highly available, column-oriented data store in use use at Netflix, Twitter, Urban Airship, Constant Contact, Reddit, Cisco, OpenX, Digg, CloudKick, Ooyala and more companies that have large, active data sets. The largest known Cassandra cluster has over 300 TB of data in over 400 machines.

This open source project managed by the Apache foundation offers a compelling combination of a rich data model, a robust deployment track record, and a sound architecture. This video presents the Cassandra’s data model, works through its API in Java and Groovy, talks about how to deploy it and looks at use cases in which it is an appropriate data storage solution.

It explores the Amazon Dynamo project and Google’s BigTable and explains how its architecture helps us achieve the gold standard of scalability: horizontal scalability on commodity hardware. You will be ready to begin experimenting with Cassandra immediately and planning its adoption in your next project.

Take some time to look at CQL – Cassandra Query Language.

BTW, Berglund is a good presenter.

February 21, 2012

InfiniteGraph – “…Create, Define, Repeat, and Visualize Results in Minutes”

Filed under: Graphs,InfiniteGraph,NoSQL — Patrick Durusau @ 7:59 pm

Objectivity Adds New Plugin Framework, Integrated Visualizer And Support For Tinkerpop Blueprints To InfiniteGraph

From the post:

“Of the numerous varieties of NoSQL databases, graph databases have the potential to significantly alter the analytics sector by enabling companies to unlock value based on understanding and analyzing the relationships between data,” said Matt Aslett, research management, data management and analytics, 451 Research. ”The new additions to Objectivity’s InfiniteGraph enable developers to achieve results in real time and also realize additional value by making the queries repeatable.”

Plugin Framework:
InfiniteGraph’s Plugin Framework provides developers with the ultimate in flexibility and supports the creation, import, and repeated use of plugins that modularize useful functionality. Developers can leverage successful queries, adjust parameters when appropriate, test queries and gain real-time results. A Navigator plugin bundles components that assist in navigation queries, e.g. result qualifiers, path qualifiers, and guides. The Formatter plugin formats and outputs results of graph queries. These plugins can be loaded and used in the InfiniteGraph Visualizer, and reused in InfiniteGraph applications.

Enhanced IG Visualizer:
The Visualizer is now tightly integrated with InfiniteGraph’s Plugin Framework allowing indexing queries for edges and export of GraphML and JSON (built-in) or other user-defined plugin formats. The Visualizer allows users to easily load plugins with enhanced control and navigation. Developers can parameterize plugins to control runtime behavior. Now every part of the graph is fully customizable and delivers a sophisticated result display for each query.

Support for Tinkerpop Blueprints:
InfiniteGraph provides a clean integration with Tinkerpop Blueprints, a popular property graph model interface with provided implementations, and is well-suited for applications that want to traverse and query graph databases using Gremlin.

That’s a bundle of news at one time for sure! The plugin architecture sounds particularly interesting.

Curious if anyone has developed a JDBC that enables access to data in a relational database as a graph?

Riak 1.1 + Webinar

Filed under: NoSQL,Riak — Patrick Durusau @ 7:59 pm

Riak 1.1 Release + Webinar

This post almost didn’t happen. I got an email notice about this release and when I went to the web page “version,” every link pointed to the 29 February 2012 webinar on Riak 1.1. If the term was “webinar” or “Riak 1.1,” multiple times.

So I go to the Basho website, this is big news. Nothing on the blog. There is an image on the homepage if you know which one to choose.

Finally, I went to company -> news -> “Basho Unveils New Graphical Operations Dashboard, Diagnostics with Release of Riak 1.1.”

OK, not the best headline but at least you know you have arrived at the right place.

Tip: Don’t make news about your product or company hard to find. (KISS4S – Keep it simple stupid for stupids)

After getting there I find:

Riak 1.1 boosts data synchronization performance for multi-data center deployments, provides operating system and
installation diagnostics and improves operational control for very large clusters. Riak 1.1 delivers a range of new
features and improvements including:

  • Riak Control, a completely open source and intuitive administrative console for managing, monitoring and interfacing with Riak clusters
  • Riaknostic, an open source, proactive diagnostic suite for detecting common configuration and runtime problems
  • Enhanced error logging and reporting
  • Improved resiliency for large clusters
  • Automatic data compression using the Snappy compression library

Additionally, Riak EDS (Enterprise Data Store), Basho’s commercial
distribution based on Riak, features major enhancements, primarily for multi-data center replication:

  • Introduction of bucket-level replication, adding more granularity and robustness
  • Various distinct data center synchronization options are now available, each optimized for different use cases
  • Significant improvement of data synchronization across multiple data centers

“The 1.1 release is focused on simplifying life for developers and administrators. Basho’s new Riak Control and
Riaknostic components move Riak open source forward, providing an easy and intuitive way to diagnose, manage and
monitor Riak platforms,” said Don Rippert, CEO, Basho. “While Riak Control was originally part of Basho’s commercial
offering, we decided to release the code as part of Riak 1.1 to reinforce our commitment to the open source community.”

The notice was worth hunting for and the release looks very interesting.

As added incentive, you can get free Riak/Basho stickers. I think sew on patches would be good as well. Instead of biker jacket you could have a developer jacket. 😉

February 7, 2012

Hypertable – New Website/Documentation

Filed under: Hypertable,NoSQL — Patrick Durusau @ 4:33 pm

Hypertable – New Website/Documentation

If you have looked at Hypertable before, you really need to give it another look!

The website is easy to navigate, the documentation is superb and you will have to decide for yourselves if Hypertable meets your requirements.

I don’t think my comment in a post last October:

Users even (esp?) developers aren’t going to work very hard to evaluate a new and/or unknown product. Better marketing would help Hypertable.

had anything to do with this remodeling. But, I am glad to see it because Hypertable is an interesting technology.

Hybrid SQL-NoSQL Databases Are Gaining Ground

Filed under: NoSQL,SQL,SQL-NoSQL — Patrick Durusau @ 4:29 pm

Hybrid SQL-NoSQL Databases Are Gaining Ground

From the post:

Hybrid SQL-NoSQL database solutions combine the advantage of being compatible with many SQL applications and providing the scalability of NoSQL ones. Xeround offers such a solution as a service in the cloud, including a free edition. Other solutions: Database.com with ODBC/JDBC drivers, NuoDB, Clustrix, and VoltDB.

Xeround provides a DB-as-a-Service based on a SQL-NoSQL hybrid. The front-end is a MySQL query engine, appealing to the already existing large number of MySQL applications, but its storage API works with an in-memory distributed NoSQL object store up to 50 GB in size. Razi Sharir, Xeround CEO, detailed for InfoQ:

Read the post to find offers of smallish development space for free.

Do you get the sense that terminology is being invented at a rapid pace in this area? Which is going to make comparing SQL, NoSQL, SQL-NoSQL, etc., offerings more and more difficult? Not to mention differences due to platforms (including the cloud).

Doesn’t that make it difficult for both private as well as government CIO’s to:

  1. Formulate specifications for RFPs
  2. Evaluate responses to RFPs
  3. Measure performance or meeting of other requirements across responses
  4. Same as #3 but under actual testing condition?

Semantic impedance, it will be with us always.

NoSQL: The Joy is in the Details

Filed under: MongoDB,NoSQL — Patrick Durusau @ 4:28 pm

NoSQL: The Joy is in the Details by James Downey.

From the post:

Whenever my wife returns excitedly from the mall having bought something new, I respond on reflex: Why do we need that? To which my wife retorts that if it were up to me, humans would still live in caves. Maybe not caves, but we’d still program in C and all applications would run on relational databases. Fortunately, there are geeks out there with greater imagination.

When I first began reading about NoSQL, I ran into the CAP Theorem, according to which a database system can provide only two of three key characteristics: consistency, availability, or partition tolerance. Relational databases offer consistency and availability, but not partition tolerance, namely, the capability of a database system to survive network partitions. This notion of partition tolerance ties into the ability of a system to scale horizontally across many servers, achieving on commodity hardware the massive scalability necessary for Internet giants. In certain scenarios, the gain in scalability makes worthwhile the abandonment of consistency. (For a simplified explanation, see this visual guide. For a heavy computer science treatment, see this proof.)

And so I plan to spend time this year exploring and posting about some of the many NoSQL options out there. I’ve already started a post on MongoDB. Stay tuned for more. And if you have any suggestions for which database I should look into next, please make a comment.

Definitely a series of posts I will be following this year. Suggest that you do the same.

February 5, 2012

Highly Connected Data Models in NOSQL Stores

Filed under: Neo4j,NoSQL — Patrick Durusau @ 7:55 pm

Highly Connected Data Models in NOSQL Stores by Jim Webber.

From the description:

In this talks, we\’ll talk about the key ideas of NOSQL databases, including motivating similarities and more importantly their different strengths and weaknesses. In more depth, we’ll focus on the characteristics of graph stores for connected data and the kinds of problems for which they are best suited. To reinforce how useful graph stores are, we provide a rapid, code-focussed example using Neo4j covering the basics of graph stores, and the APIs for manipulating and traversing graphs. We\’ll then use this knowledge to explore the Doctor Who universe, using graph databases to infer useful knowledge from connected, semi-structured data. We conclude with a discussion of when different kinds of NOSQL stores are most appropriate the enterprise.

Deeply amusing and informative presentation.

Perhaps the most telling point was deciding between relational and graph database usage, based on the sparseness of the relational table. If sparse, don’t have “square data” and so probably better off with graph database.

Saw this in a tweet by Savas Pavastatidis.

January 27, 2012

NOSQL for bioinformatics: Bio4j, a real world use case using Neo4j (Madrid, Spain)

Filed under: Bioinformatics,Neo4j,NoSQL — Patrick Durusau @ 4:35 pm

NOSQL for bioinformatics: Bio4j, a real world use case using Neo4j

Monday, January 30, 2012, 7:00 PM

From the meeting notice:

The world of data is changing. Big Data and NOSQL are bringing new ways of looking at and understanding your data. Prominent in the trend is Neo4j, a graph database that elevates relationships to first-class citizens, uniquely offering a way to model and query highly connected data.

This opens a whole new world of possibilities for a wide range of fields, and bioinformatics is no exception. Quite the opposite, this paradigm provides bioinformaticians with a powerful and intuitive framework for dealing with biological data which by nature is incredibly interconnected.

We’ll give a quick overview of the NOSQL world today, introducing then Neo4j in particular. Afterwards we’ll move to real use cases focusing in Bio4j project.

I would really love to see this presentation, particularly the Bio4j part.

But, I won’t be in Madrid this coming Monday.

If you are, don’t miss this presentation! Take good notes and blog about it. The rest of us would appreciate it!

ROMA User-Customizable NoSQL Database in Ruby

Filed under: NoSQL,ROMA,Ruby — Patrick Durusau @ 4:34 pm

ROMA User-Customizable NoSQL Database in Ruby

From the presentation:

  • User-customizable NoSQL database in Ruby
  • Features
    • Key-value model
    • High scalability
    • High availability
    • Fault-tolerance
    • Better throughput
    • And…
  • To meet application-specific needs, ROMA provides
    • Plug-in architecture
    • Domain specific language (DSL) for Plug-in
  • ROMA enables meeting the above need in Rakuten Travel

The ROMA source code: http://github.com/roma/roma/

Reportedly has 70 million users and while that may not be “web scale,” it may scale enough to meet your needs. 😉

Of particular interest are the DSL capabilities. See slides 31-33. Declaring your own commands. Something for other projects to consider.

Countandra

Filed under: NoSQL,Semantic Diversity — Patrick Durusau @ 4:34 pm

Countandra

From the webpage:

Since Aryabhatta invented zero, Mathematicians such as John von Neuman have been in pursuit of efficient counting and architects have constantly built systems that computes counts quicker. In this age of social media, where 100s of 1000s events take place every second, we were inspired by twitter’s Rainbird project to develop distributed counting engine that can scale linearly.

Countandra is a hierarchical distributed counting engine on top of Cassandra (to increment/decrement hierarchical data) and Netty (HTTP Based Interface). It provides a complete http based interface to both posting events and getting queries. The syntax of a event posting is done in a FORMS compatible way. The result of the query is emitted in JSON to make it maniputable by browsers directly.

Features

  • Geographically distributed counting.
  • Easy Http Based interface to insert counts.
  • Hierarchical counting such as com.mywebsite.music.
  • Retrieves counts, sums and square in near real time.
  • Simple Http queries provides desired output in JSON format
  • Queries can be sliced by period such as LASTHOUR,LASTYEAR and so on for MINUTELY,HOURLY,DAILY,MONTHLY values
  • Queries can be classified for anything in hierarchy such as com, com.mywebsite or com.mywebsite.music
  • Open Source and Ready to Use!

Countandra illustrates that not every application need be a general purpose one. Countandra is designed to be a counting engine and to answer defined query types, nothing more.

There is a lesson there for semantic diversity solutions. It is better to attempt to solve part of the semantic diversity issue than to attempt a solution for everyone. At least partial solutions have a chance of being a benefit before being surpassed by changing technologies and semantics.

BTW, Countandra using a Java long for time values so in the words of the Unix Time Wikipedia entry:

In the negative direction, this goes back more than twenty times the age of the universe, and so suffices. In the positive direction, whether the approximately 293 billion representable years is truly sufficient depends on the ultimate fate of the universe, but it is certainly adequate for most practical purposes.

Rather than “suffices” and “most practical purposes” I would have said, “is adequate for present purposes” in both cases.

Getting Started with Apache Cassandra (realistic data import example)

Filed under: Cassandra,NoSQL — Patrick Durusau @ 4:30 pm

Getting Started with Apache Cassandra

From the post:

If you haven’t begun using Apache Cassandra yet and you wanted a little handholding to help get you started, you’re in luck. This article will help you get your feet wet with Cassandra and show you the basics so you’ll be ready to start developing Cassandra applications in no time.

Why Cassandra?

Do you need a more flexible data model than what’s offered in the relational database world? Would you like to start with a database you know can scale to meet any number of concurrent user connections and/or data volume size and run blazingly fast? Have you been needing a database that has no single point of failure and one that can easily distribute data among multiple geographies, data centers, and the cloud? Well, that’s Cassandra.

Not to pick on Cassandra or this post in particular but have you noticed that introductory articles have you enter a trivial amount of data as a starting point? Which makes sense, you need to learn the basics but why not conclude with importing a real data set? Particularly for databases what “scale” so well.

For example, detail how to import campaign donations records from the Federal Election Commission in the United States. Which are written in COBOL format. That would give the user a better data set for CQL exercises.

A Full Table Scan of Indexing in NoSQL

Filed under: Indexing,NoSQL — Patrick Durusau @ 4:30 pm

A Full Table Scan of Indexing in NoSQL by Will LaForest (MongoDB).

One slide reads:

What Indexes Can Help Us Do

  • Find the “location” of data
    • Based upon a value
    • Based upon a range
    •  Geospatial
  • Fast checks for existence
    • Uniqueness enforcement
  • Sorting
  • Aggregation
    • Usually covering indexes

The next slide is titled: “Requisite Book Analogy” with an image of a couple of pages from an index.

So, let’s copy out some of those entries and see where they fit into Will’s scheme:

Bears, 75, 223
Beds, good, their moral influence, 184, 186
Bees, stationary civilisation of, 195
Beethoven on Handel, 18
Beginners in art, how to treat them, 195

The entry for Bears I think qualifies for “location of data based on a value.

And I see sorting, but those two are the only aspects of Will’s indexing that I see.

Do you see more?

What I do see is that the index is expressing relationships between subjects (“Beethoven on Handel”) and commenting on what information awaits a reader (“Beds, good, their moral influence”).

A NoSQL index could replicate the strings of these entries but without the richness of this index.

For example, consider the entry:

Aurora Borealis like pedal notes in Handel’s bass, 83

One expects that the entry on Handel to contain that reference as well as the one of “Beethoven on Handel.” (I have only the two pages in this image and as far as I know, I haven’t seen this particular index before.)

Question: How would you use the indexes in MongoDB to represent the richness of these two pages?

Question: Where did MongoDB (or other NoSQL) indexing fail?

Important to remember that indexes prior to the auto-generated shallowness of recent decades were highly skilled acts of authorship, that were a value-add for readers.

« Newer PostsOlder Posts »

Powered by WordPress