Archive for the ‘RethinkDB’ Category

How to install RethinkDB on Raspberry PI 2

Sunday, March 8th, 2015

How to install RethinkDB on Raspberry PI 2 by Zaher Ghaibeh.

A short video to help you install RethinkDB on your Raspberry PI 2.

Install when your Raspberry PI 2 is not part of a larger cluster of Raspberry PI 2’s that is tracking social data on your intelligence services.

Your advantage in that regard is that you aren’t (shouldn’t be) piling up bigger haystacks to investigate for needles using a pitchfork.

Focused intelligence (beginning with HUMINT and incorporating SIGINT and other types of intelligence, can result in much higher quality intelligence at lower operational cost when compared to the data vacuum approach.

For one reason, knowing the signal you are seeking boosts the chances of it being detected. Searching for an unknown signal adrift in a sea of data is a low percentage proposition.

How do you plan to use your RethinkDB to track intelligence on local or state government?

Elasticsearch, RethinkDB and the Semantic Web

Wednesday, June 11th, 2014

Elasticsearch, RethinkDB and the Semantic Web by Michel Dumontier.

From the post:

Everyone is handling big data nowadays, or at least, so it seems. Hadoop is very popular among the Big Data wranglers and it is often mentioned as the de facto solution. I have dabbled into working with Hadoop over the past years and found that: yes, it is very suitable for certain kinds of data mining/analysis and for those it provides high data crunching throughput, but, no, it cannot answer queries quickly and you cannot port every algorithm into Hadoop’s map/reduce paradigm. I have since turned to Elasticsearch and more recently to RethinkDB. It is a joy to work with the latter and it performs faceting just as well as Elasticsearch for the benchmark data that I used, but still permits me to carry out more complex data mining and analysis too.

The story here describes the data that I am working with a bit, it shows how it can be turned into a data format that both Elasticsearch and RethinkDB understand, how the data is being loaded and indexed, and finally, how to get some facets out of the systems.

Interesting post on biomedical data in RDF N-Quads format which is converted into JSON and then processed with ElasticSearch and RethinkDB.

I first saw this in a tweet by Joachim Baran.

ReThinkDB

Friday, November 9th, 2012

ReThinkDB

From the homepage:

An open-source distributed database built with love.

Enjoy an intuitive query language, automatically parallelized queries, and simple administration.

Table joins and batteries included.

and the overview:

RethinkDB is built to store JSON documents, and scale to multiple machines with very little effort. It has a pleasant query language that supports really useful queries like table joins and group by, and is easy to setup and learn.

Simple programming model:

  • JSON data model and immediate consistency.
  • Distributed joins, subqueries, aggregation, atomic updates.
  • Hadoop-style map/reduce.

Easy administration:

  • Friendly web and command-line administration tools.
  • Takes care of machine failures and network interrupts.
  • Multi-datacenter replication and failover.

Horizontal scalability:

  • Sharding and replication to multiple nodes.
  • Queries are automatically parallelized and distributed.
  • Lock-free operation via MVCC concurrency.

Just once I would like to see a software release where the feature list reads:

<humor>Job Security – Never mentioned by “easy to learn” software packages. Our software is a stone cold bitch to learn. The usual ‘hello world” takes the better part of a day. But, who wants to write “hello world?”

Once you do learn it, it has more power than native C code and is faster. Are you a top gun programmer or a script kiddie? We write software for the former, not the latter.
</humor>

Probably not going to happen.

BTW, at this time ReThinkDB does not support secondary indexes. But the way the documentation reads, that doesn’t sound like a permanent condition.

Could be useful for some cases and certainly will be.

NoSQL Exchange – 2 November 2011

Thursday, November 3rd, 2011

NoSQL Exchange – 2 November 2011

It doesn’t get much better or fresher (for non-attendees) than this!

  • Dr Jim Webber of Neo Technology starts the day by welcoming everyone to the first of many annual NOSQL eXchanges. View the podcast here…
  • Emil Eifrém gives a Keynote talk to the NOSQL eXchange on the past, present and future of NOSQL, and the state of NOSQL today. View the podcast here…
  • HANDLING CONFLICTS IN EVENTUALLY CONSISTENT SYSTEMS In this talk, Russell Brown examines how conflicting values are kept to a minimum in Riak and illustrates some techniques for automating semantic reconciliation. There will be practical examples from the Riak Java Client and other places.
  • MONGODB + SCALA: CASE CLASSES, DOCUMENTS AND SHARDS FOR A NEW DATA MODEL Brendan McAdams — creator of Casbah, a Scala toolkit for MongoDB — will give a talk on “MongoDB + Scala: Case Classes, Documents and Shards for a New Data Model”
  • REAL LIFE CASSANDRA Dave Gardner: In this talk for the NOSQL eXchange, Dave Gardner introduces why you would want to use Cassandra, and focuses on a real-life use case, explaining each Cassandra feature within this context.
  • DOCTOR WHO AND NEO4J Ian Robinson: Armed only with a data store packed full of geeky Doctor Who facts, by the end of this session we’ll have you tracking down pieces of memorabilia from a show that, like the graph theory behind Neo4j, is older than Codd’s relational model.
  • BUILDING REAL WORLD SOLUTION WITH DOCUMENT STORAGE, SCALA AND LIFT Aleksa Vukotic will look at how his company assessed and adopted CouchDB in order to rapidly and successfully deliver a next generation insurance platform using Scala and Lift.
  • ROBERT REES ON POLYGLOT PERSISTENCE Robert Rees: Based on his experiences of mixing CouchDB and Neo4J at Wazoku, an idea management startup, Robert talks about the theory of mixing your stores and the practical experience.
  • PARKBENCH DISCUSSION This Park Bench discussion will be chaired by Jim Webber.
  • THE FUTURE OF NOSQL AND BIG DATA STORAGE Tom Wilkie: Tom Wilkie takes a whistle-stop tour of developments in NOSQL and Big Data storage, comparing and contrasting new storage engines from Google (LevelDB), RethinkDB, Tokutek and Acunu (Castle).

And yes, I made a separate blog post on Neo4j and Dr. Who. 😉 What can I say? I am a fan of both.

RethinkDB

Thursday, November 3rd, 2011

RethinkDB

From the features page:

RethinkDB is a persistent, industrial-strength key-value store with full support for the Memcached protocol.

Powerful technology

  • Ten times faster on solid-state
  • Linear scaling across cores
  • Fine-grained durability control
  • Instantaneous recovery on power failure

Supported core features

  • Point queries
  • Atomic increment/decrement
  • Arbitrary atomic operations
  • Append/prepend operations
  • Values up to 10MB in size
  • Pipelining support
  • Row expiration support
  • Multi-GET support

I particularly liked this line:

Can I use RethinkDB even if I don’t have solid-state drives in my infrastructure?

While RethinkDB performs best on dedicated commodity hardware that has a multicore processor and is backed by solid-state storage, it will still deliver a performance advantage both on rotational drives and in the cloud. (emphasis added to the answer)

Don’t worry your “rotational drives” and “cloud” account have not suddenly become obsolete. The skill you need to acquire before the next upgrade cycle is evaluating performance claims with your processes and data.

It doesn’t matter that all the UN documents can be retrieved in under sub-millisecond time, translated and served with a hot Danish if you don’t use the same format, have no need for translation and are more a custard tart fan. Vendor performance figures may attract your interest but your decision making should be driven by performance figures that represent your environment.

Build into the acquisition budget funding for your staff to replicate a representative subset of your data and processes for testing with vendor software/hardware. True enough, after the purchase you will probably toss that subset, but remember you will be living with the software purchase for years. And be known as the person who managed the project. Suddenly spending a little more money on making sure your requirements are met doesn’t sound so bad.