Archive for the ‘Hypertable’ Category

Hypertable Has Reached A Major Milestone

Thursday, February 14th, 2013

Hypertable Has Reached A Major Milestone by Doug Judd.

From the post:

RangeServer Failover

With the release of Hypertable version 0.9.7.0 comes support for automatic RangeServer failover. Hypertable will now detect when a RangeServer has failed, logically remove it from the system, and automatically re-assign the ranges that it was managing to other RangeServers. This represents a major milestone for Hypertable and allows for very large scale deployments. We have been actively working on this feature, full-time, for 1 1/2 years. To give you an idea of the magnitude of the change, here are the commit statistics:

  • 441 changed files
  • 17,522 line additions
  • 6,384 line deletions

The reason that this feature has been a long time in the making is because we placed a very high standard of quality for this feature so that under no circumstance, a RangeServer failure would lead to consistency problems or data loss. We’re confident that we’ve achieved 100% correctness under every conceivable circumstance. The two primary goals for the feature, robustness and applicaiton transparancy, are described below.

That is a major milestone!

High-end data processing is becoming as crowded with viable options as low-end data processing. And the “low-end” of data processing keeps getting bigger.

Sehrch.com: … Powered by Hypertable (Semantic Web/RDF Performance by Design)

Thursday, March 22nd, 2012

Sehrch.com: A Structured Search Engine Powered By Hypertable

From the introduction:

Sehrch.com is a structured search engine. It provides powerful querying capabilities that enable users to quickly complete complex information retrieval tasks. It gathers conceptual awareness from the Linked Open Data cloud, and can be used as (1) a regular search engine or (2) as a structured search engine. In both cases conceptual awareness is used to build entity centric result sets.

To facilitate structured search we have introduced a new simple search query syntax that allows for bound properties and relations (contains, less than, more than, between, etc). The initial implementation of Sehrch.com was built over an eight month period. The primary technologies used are Hypertable and Lucene. Hypertable is used to store all Sehrch.com data which is in RDF (Resource Description Framework). Lucene provides the underlying searcher capability. This post provides an introduction to structured web search and an overview of how we tackled our big data problem.

A bit later you read:

We achieved a stable loading throughput of 22,000 triples per second (tps), peaking at 30,000 tps. Within 24 hours we had loaded the full 1.3 billion triples on a single node, on hardware that was at least two years out of date. We were shocked, mostly because on the same hardware the SPARQL compliant triplestores had managed 80 million triples (Virtuoso) at the most and Hypertable had just loaded all 1.3 billion. The 500GB of input RDF had become a very efficient 50GB of Hypertable data files. But then, loading was only half of the problem, could we query? We wrote a multi-threaded data exporter that would query Hypertable for entities by subject (Hypertable row key) randomly. We ran the exporter, and achieved speeds that peaked at 1,800 queries per second. Again we were shocked. Now that the data the challenge had set forth was loaded, we wondered how far Hypertable could go on the same hardware.

So we reloaded the data, this time appending the row keys with 1. Hypertable completed the load again, in approximately the same time. So we ran it again, now appending the keys with 2. Hypertable completed again, again in the same time frame. We now had a machine which was only 5% of our eventual production specification that stored 3.6 billion triples, three copies each of DBpedia and Freebase. We reran our data exporter and achieved query speeds that ranged between 1,000-1,300 queries per second. From that day on we have never looked back, Hypertable solved our data storage problem, it smashed the challenge that we set forth that would determine if Sehrch.com was at all possible. Hypertable made it possible.

That’s performance by design, not brute force.

On the other hand, the results of: Pop singers less than 20 years old, could be improved. Page after page of Miley Cyrus results gets old in a hurry. 😉

I am sure the team at Sehrch.com would appreciate your suggestions and comments.

Secondary Indices Have Arrived! (Hypertable)

Thursday, March 22nd, 2012

Secondary Indices Have Arrived! (Hypertable)

From the post:

Until now, SELECT queries in Hypertable had to include a row key, row prefix or row interval specification in order to be fast. Searching for rows by specifying a cell value or a column qualifier involved a full table scan which resulted in poor performance and scaled badly because queries took longer as the dataset grew. With 0.9.5.6, we’ve implemented secondary indices that will make such SELECT queries lightning fast!

Hypertable supports two kinds of indices: a cell value index and a column qualifier index. This blog post explains what they are, how they work and how to use them.

I am glad to hear about the new indexing features but how do “cell value indexes” and “column qualifier indexes” differ from secondary indexes as described in the PostgreSQL 9.1 documentation as:

All indexes in PostgreSQL are what are known technically as secondary indexes; that is, the index is physically separate from the table file that it describes. Each index is stored as its own physical relation and so is described by an entry in the pg_class catalog. The contents of an index are entirely under the control of its index access method. In practice, all index access methods divide indexes into standard-size pages so that they can use the regular storage manager and buffer manager to access the index contents.

It would be helpful in evaluating new features to know when (if?) they are substantially the same as features known in other contexts.

Hypertable – New Website/Documentation

Tuesday, February 7th, 2012

Hypertable – New Website/Documentation

If you have looked at Hypertable before, you really need to give it another look!

The website is easy to navigate, the documentation is superb and you will have to decide for yourselves if Hypertable meets your requirements.

I don’t think my comment in a post last October:

Users even (esp?) developers aren’t going to work very hard to evaluate a new and/or unknown product. Better marketing would help Hypertable.

had anything to do with this remodeling. But, I am glad to see it because Hypertable is an interesting technology.

Hypertable 0.9.5.1 Binary Packages

Friday, October 21st, 2011

Hypertable 0.9.5.1 Binary Packages

New release (up from 0.9.5.0) of Hypertable.

You can see the Release Notes. It is slow going but a large number of bugs have been fixed and new features added.

The Hypertable Manual.

I have the sense that the software has a lot of potential but the website doesn’t offer enough examples to make that case. In fact, you have to hunt for the manual (it is linked above and/or has a link on the downloads page). Users even (esp?) developers aren’t going to work very hard to evaluate a new and/or unknown product. Better marketing would help Hypertable.

A Genome Sequence Analysis…

Friday, August 26th, 2011

A Genome Sequence Analysis System Built With Hypertable by Doug Judd.

Interesting use of matching to discover new or novel genetic information (deletes matches, what’s left is new/novel).

Hypertable 0.9.5.0 Binary Packages

Saturday, July 30th, 2011

Hypertable 0.9.5.0 Binary Packages (download)

New release of Hypertable!

Change notes.

Hypertable 0.9.5.0.pre6

Tuesday, June 14th, 2011

Hypertable 0.9.5.0.pre6

From the release notes:

Fixed bug in MaintenanceScheduler introduced w/ merging compactions
Fixed bug in the FileBlockCache wrt growing to accomodate
Added support for DELETE lines in .tsv files
Added check for DFSBROKER_BAD_FILENAME on skip_not_found
Added –metadata-tsv option to metalog_dump
Fixed bug whereby get_table_splits() was returning stale results for previously dropped tables.
Added MaintenanceScheduler state dump via existence of run/debug-scheduler file
Fixed FMR in TableMutator timeout

Hypertable 0.9.5.0 pre-release

Tuesday, May 10th, 2011

Stability Improvements in the Hypertable 0.9.5.0 pre-release

From the Hypertable blog:

We recently announced the Hypertable 0.9.5.0 pre-release. Even though we’ve labelled it as a “pre” release, it is one of the biggest and most important Hypertable releases to date. Among other things, it includes a complete re-write of the Master, to fix some known stability problems. It represents a significant amount of work as can be seen by the following code change statistics:

  • 512 files changed
  • 30,633 line insertions
  • 14,354 line deletions

The following describes problems that existed in prior releases and how they were solved, and highlights other stability improvements included in the 0.9.5.0 pre-release.

Details on the recent “pre-release” of Hypertable.

Hypertable 0.9.5.0

Saturday, March 26th, 2011

Hypertable 0.9.5.0

My first encounter with this project lead me to: http://www.hypertable.com, which is a commercial venture offering support for open source software.

Except that that wasn’t really clear from the .com homepage.

I finally tracked links back to: http://code.google.com/p/hypertable/ to discover its GNU GPL v2 license.

The list of ventures using Hypertable is an impressive one.

Linking to the documentation at the .org site from the .com site would be a real plus.

A bit more attention to the .com site might attract more business, use cases, that sort of thing.