NoSQL « Another Word For It

November 3, 2011

NoSQL Exchange – 2 November 2011

Filed under: Acunu,BigData,Cassandra,Conferences,CouchDB,leveldb,MongoDB,NoSQL,RethinkDB,Riak,Scala,Tokutek — Patrick Durusau @ 7:20 pm

NoSQL Exchange – 2 November 2011

It doesn’t get much better or fresher (for non-attendees) than this!

Dr Jim Webber of Neo Technology starts the day by welcoming everyone to the first of many annual NOSQL eXchanges. View the podcast here…
Emil Eifrém gives a Keynote talk to the NOSQL eXchange on the past, present and future of NOSQL, and the state of NOSQL today. View the podcast here…
HANDLING CONFLICTS IN EVENTUALLY CONSISTENT SYSTEMS In this talk, Russell Brown examines how conflicting values are kept to a minimum in Riak and illustrates some techniques for automating semantic reconciliation. There will be practical examples from the Riak Java Client and other places.
MONGODB + SCALA: CASE CLASSES, DOCUMENTS AND SHARDS FOR A NEW DATA MODEL Brendan McAdams — creator of Casbah, a Scala toolkit for MongoDB — will give a talk on “MongoDB + Scala: Case Classes, Documents and Shards for a New Data Model”
REAL LIFE CASSANDRA Dave Gardner: In this talk for the NOSQL eXchange, Dave Gardner introduces why you would want to use Cassandra, and focuses on a real-life use case, explaining each Cassandra feature within this context.
DOCTOR WHO AND NEO4J Ian Robinson: Armed only with a data store packed full of geeky Doctor Who facts, by the end of this session we’ll have you tracking down pieces of memorabilia from a show that, like the graph theory behind Neo4j, is older than Codd’s relational model.
BUILDING REAL WORLD SOLUTION WITH DOCUMENT STORAGE, SCALA AND LIFT Aleksa Vukotic will look at how his company assessed and adopted CouchDB in order to rapidly and successfully deliver a next generation insurance platform using Scala and Lift.
ROBERT REES ON POLYGLOT PERSISTENCE Robert Rees: Based on his experiences of mixing CouchDB and Neo4J at Wazoku, an idea management startup, Robert talks about the theory of mixing your stores and the practical experience.
PARKBENCH DISCUSSION This Park Bench discussion will be chaired by Jim Webber.
THE FUTURE OF NOSQL AND BIG DATA STORAGE Tom Wilkie: Tom Wilkie takes a whistle-stop tour of developments in NOSQL and Big Data storage, comparing and contrasting new storage engines from Google (LevelDB), RethinkDB, Tokutek and Acunu (Castle).

And yes, I made a separate blog post on Neo4j and Dr. Who. 😉 What can I say? I am a fan of both.

Comments Off

RethinkDB

Filed under: Key-Value Stores,Memcached,NoSQL,RethinkDB — Patrick Durusau @ 7:17 pm

RethinkDB

From the features page:

RethinkDB is a persistent, industrial-strength key-value store with full support for the Memcached protocol.

Powerful technology

Ten times faster on solid-state

Linear scaling across cores

Fine-grained durability control

Instantaneous recovery on power failure

Supported core features

Point queries

Atomic increment/decrement

Arbitrary atomic operations

Append/prepend operations

Values up to 10MB in size

Pipelining support

Row expiration support

Multi-GET support

I particularly liked this line:

Can I use RethinkDB even if I don’t have solid-state drives in my infrastructure?

While RethinkDB performs best on dedicated commodity hardware that has a multicore processor and is backed by solid-state storage, it will still deliver a performance advantage both on rotational drives and in the cloud. (emphasis added to the answer)

Don’t worry your “rotational drives” and “cloud” account have not suddenly become obsolete. The skill you need to acquire before the next upgrade cycle is evaluating performance claims with your processes and data.

It doesn’t matter that all the UN documents can be retrieved in under sub-millisecond time, translated and served with a hot Danish if you don’t use the same format, have no need for translation and are more a custard tart fan. Vendor performance figures may attract your interest but your decision making should be driven by performance figures that represent your environment.

Build into the acquisition budget funding for your staff to replicate a representative subset of your data and processes for testing with vendor software/hardware. True enough, after the purchase you will probably toss that subset, but remember you will be living with the software purchase for years. And be known as the person who managed the project. Suddenly spending a little more money on making sure your requirements are met doesn’t sound so bad.

Comments Off

October 26, 2011

Oracle Releases NoSQL Database

Filed under: Java,NoSQL,Oracle — Patrick Durusau @ 6:58 pm

Oracle Releases NoSQL Database by Leila Meyer.

From the post:

Oracle has released Oracle NoSQL Database 11g, the company’s new entry into the NoSQL database market. Oracle NoSQL Database is a distributed, highly scalable, key-value database that uses the Oracle Berkeley Database Java Edition as its underlying storage system. Developed as a key component of the Oracle Big Data Appliance that was unveiled Oct. 3, Oracle NoSQL Database is available now as a standalone product.

(see the post for the list of features and other details)

Oracle NoSQL Database will be available in a Community Edition through an open source license and an Enterprise Edition through an Oracle Technology Network (OTN) license. The Community Edition is still awaiting final licensing approval, but the Enterprise Edition is available now for download from the Oracle Technology Network.

Don’t know that I will have the time but it would be amusing to compare the actual release with pre-release speculation about its features and capabilities.

More to follow as information becomes available.

Comments Off

Overview of the Oracle NoSQL Database

Filed under: NoSQL,Oracle — Patrick Durusau @ 6:58 pm

Overview of the Oracle NoSQL Database nice review by Daniel Abadi.

Where Daniel is inferring information he makes that clear but as one of the leading researchers in the area, I suspect we will find, eventually, that he wasn’t far off the mark.

Interesting reading.

Comments Off

NoSQL notes

Filed under: NoSQL — Patrick Durusau @ 6:57 pm

NoSQL notes

From the post:

Last week I visited with James Phillips of Couchbase, Max Schireson and Eliot Horowitz of 10gen, and Todd Lipcon, Eric Sammer, and Omer Trajman of Cloudera. I guess it’s time for a round-up NoSQL post. 🙂

Views of the NoSQL market horse race are reasonably consistent, with perhaps some elements of “Where you stand depends upon where you sit.”

Quite a bit of “where you sit,” although amusing none the less.

Comments Off

October 21, 2011

Hypertable 0.9.5.1 Binary Packages

Filed under: Hypertable,NoSQL — Patrick Durusau @ 7:26 pm

Hypertable 0.9.5.1 Binary Packages

New release (up from 0.9.5.0) of Hypertable.

You can see the Release Notes. It is slow going but a large number of bugs have been fixed and new features added.

The Hypertable Manual.

I have the sense that the software has a lot of potential but the website doesn’t offer enough examples to make that case. In fact, you have to hunt for the manual (it is linked above and/or has a link on the downloads page). Users even (esp?) developers aren’t going to work very hard to evaluate a new and/or unknown product. Better marketing would help Hypertable.

Comments (1)

Using MongoDB in Anger

Filed under: Database,Indexing,MongoDB,NoSQL — Patrick Durusau @ 7:26 pm

Using MongoDB in Anger

Tips on building high performance applications with MongoDB.

Covers four topics:

Schema design
Indexing
Concurrency
Durability

Excellent presentation!

One of the first presentations I have seen that recommends a book about another product. Well, High Performance MySQL and MongoDB in Action.

Comments Off

Build Hadoop from Source

Filed under: Hadoop,MapReduce,NoSQL — Patrick Durusau @ 7:26 pm

Build Hadoop from Source by Shashank Tiwari.

From the post:

If you are starting out with Hadoop, one of the best ways to get it working on your box is to build it from source. Using stable binary distributions is an option, but a rather risky one. You are likely to not stop at Hadoop common but go on to setting up Pig and Hive for analyzing data and may also give HBase a try. The Hadoop suite of tools suffer from a huge version mismatch and version confusion problem. So much so that many start out with Cloudera’s distribution, also know as CDH, simply because it solves this version confusion disorder.

Michael Noll’s well written blog post titled: Building an Hadoop 0.20.x version for HBase 0.90.2, serves as a great starting point for building the Hadoop stack from source. I would recommend you read it and follow along the steps stated in that article to build and install Hadoop common. Early on in the article you are told about a critical problem that HBase faces when run on top of a stable release version of Hadoop. HBase may loose data unless it is running on top an HDFS with durable sync. This important feature is only available in the branch-0.20-append of the Hadoop source and not in any of the release versions.

Assuming you have successfully, followed along Michael’s guidelines, you should have the hadoop jars built and available in a folder named ‘build’ within the folder that contains the Hadoop source. At this stage, its advisable to configure Hadoop and take a test drive.

A quick guide to “kicking the tires” as it were with part of the Hadoop eco-system.

I first saw this in the NoSQL Weekly Newsletter from http://www.NoSQLWeekly.com.

Comments Off

October 14, 2011

Couchbase Server 2.0: Most Common Questions (and Answers)

Filed under: Couchbase,NoSQL — Patrick Durusau @ 6:24 pm

Couchbase Server 2.0: Most Common Questions (and Answers) by Perry Krug.

From the post:

I just finished up a nine-week technical webinar series highlighting the features of our upcoming release of Couchbase Server 2.0. It was such a blast interacting with the hundreds of participants, and I was blown away by the level of excitement, engagement and anticipation for this new product.

(By the way, if you missed the series, all nine sessions are available for replay.) There were some great questions generated by users throughout the webinar series, and my original plan was to use this blog entry to highlight them all. I quickly realized there were too many to expect anyone to read through all of them, so I’ve taken a different tack. This blog will feature the most common/important/interesting questions and answer them here for everyone’s benefit. Before diving in, I’ll answer the question that was by far the most commonly asked: “How long until the GA of Couchbase Server 2.0?” We are currently on track to release it before the end of the year. In the meantime, please feel free to experiment with the Developer Preview that is already available. As for the rest of the questions, here goes!

This looks very good but I have a suggestion.

I am going to write to Perry to suggest that he post all the question that came up, wiki style, and let the user community explore answering the questions.

That could be a very useful community project and it would get all the questions that came up out in the open.

Comments Off

Jasondb

Filed under: Jasondb,JSON,NoSQL — Patrick Durusau @ 6:24 pm

Jasondb

From the website:

A Cloud NoSQL JSON Database

I don’t know that you will find this a useful entry into the Cloud/NoSQL race but it does come with comics. 😉

I haven’t signed up for the beta but did skim the blog.

In his design principles, complains about HTTP being slow. Maybe I should send him a pointer to: Optimizing HTTP: Keep-alive and Pipelining. What do you think?

If you join the beta, let me know what you think are strong/weak points from a topic map perspective. Thanks!

Comments Off

OrientDB version 1.0rc6

Filed under: NoSQL,OrientDB — Patrick Durusau @ 6:23 pm

OrientDB version 1.0rc6

From the post:

Hi all,
after some delays the new release is between us: OrientDB 1.0rc6. This is supposed to be the latest SNAPSHOT before official the 1.0.

Before to go in deep with this release I’d like to report you the chance to hack all together against OrientDB & Graph stuff at the next Berlin GraphDB Dojo event: http://www.graph-database.org/2011/09/28/call-for-participations-berlin-dojo/.

Direct download links

OrientDB embedded and server: http://code.google.com/p/orient/downloads/detail?name=orientdb-1.0rc6.zip
OrientDB Graph(ed): http://code.google.com/p/orient/downloads/detail?name=orientdb-graphed-1.0rc6.zip

List of changes

SQL engine: improved link navigation (issue 230)

Console: new “list databases” command (issue 389)

Index: supported composite indexes (issue 405), indexing of collections (issue 554)

JPA: supported @Embedded (issue 436) and @Transient annotations

Object Database: Disable/Enable lazy loading (issue 563)

Server: new Automatic backup task (issue 556), now installable as Windows Service (issue 61)

Client: Load balancing in clustered configuration (issue 557)

34 issues closed

This looks great!

I want to call your attention to the composite indexes issue (issue 405). An index built across multiple fields. Hmmm, composite identifiers anyone?

Comments Off

October 7, 2011

Graphic Database, NoSQL and Neo4j

Filed under: Neo4j,NoSQL — Patrick Durusau @ 6:19 pm

Graphic Database, NoSQL and Neo4j

Skip if you already know that basics but could be an explanation that resonates with newbies to NoSQL/Neo4j.

Comments Off

October 4, 2011

Using Oracle Berkeley DB as a NoSQL Data Store

Filed under: BerkeleyDB,NoSQL — Patrick Durusau @ 7:53 pm

Using Oracle Berkeley DB as a NoSQL Data Store

I saw this on Twitter but waited until I could confirm with documentation I knew to exist on an Oracle website. 😉

I take this as a sign that storage, query and retrieval technology may be about to undergo a fundamental change. Unlike “big data,” which just that, data that requires a lot of storage, how we store, query and retrieve data is much more fundamental.

The BerkeleyDB as storage engine may be a clue as to future changes. What if there was even a common substrate for database engines, SQL, NoSQ, Graph, etc.? Onto which was imposed whatever higher level operations you wished to perform? Done with a copy-on-write mechanism so every view is persisted across the data set.

A common storage substrate would be a great boon to everyone. Think of three dimensional or even crystalline storage which isn’t that far away. Now would be a good time for the major vendors to start working towards a common substrate for database engines.

Comments Off

October 3, 2011

Our big data/total data survey is now live [the 451 Group]

Filed under: BigData,Data Warehouse,Hadoop,NoSQL,SQL — Patrick Durusau @ 7:05 pm

Our big data/total data survey is now live [the 451 Group]

The post reads in part:

The 451 Group is conducting a survey into end user attitudes towards the potential benefits of ‘big data’ and new and emerging data management technologies.

…

In return for your participation, you will receive a copy of a forthcoming long-format report covering introducing Total Data, The 451 Group’s concept for explaining the changing data management landscape, which will include the results. Respondents will also have the opportunity to become members of TheInfoPro’s peer network.

Just a word about the survey.

Question 10 reads:

What is the primary platform used for storing and querying from each of the following types of data?

Good question but you have to choose one of three answers to put other (and say what “other” means), you are not allowed to skip any type of data.

Data types are:

Customer Data
Transactional Data
Online Transaction Data
Domain-specific Application Data (e.g., Trade Data in Financial Services, and Call Data in Telecoms)
Application Log Data
Web Log Data
Network Log Data
Other Log Files
Social Media/Online Data
Search Log
Audio/Video/Graphics
Other Documents/Content

Same thing happens for Question 11:

What is the primary platform used for each of the following analytics workloads?

Eleven required answers that I won’t bother to repeat here.

As a consultant I really don’t have serious iron/data on the premises but that doesn’t seem to occurred to the survey designers. Nor that even a major IT installation might not have all forms of data or analytics.

My solution? I just marked Hadoop on Questions 10 and 11 so I could get to the rest of the survey.

Q12. Which are the top three benefits associated with each of the following data management technologies?

Q13. Which are the top three challenges associated with each of the following data management technologies?

Q14. To what extent do you agree with the following statements? (which includes: “The enterprise data warehouse is the single version of the truth for business intelligence”

Questions 12 – 14 all require answers to all options.

Note the clever first agree/disagree statement for Q.14.

Someone will conduct a useful survey of business opinions about big data and likely responses to it.

Hopefully with a technical survey of the various options and their advantages/disadvantages.

Please let me know when you see it, I would like to point people to it.

(I completed this form on Sunday, October 2, 2011, around 11 AM Eastern time.)

Comments Off

October 2, 2011

Oracle rigs MySQL for NoSQL-like access

Filed under: MySQL,NoSQL — Patrick Durusau @ 6:36 pm

Oracle rigs MySQL for NoSQL-like access by Joab Jackson at CIO.

Joab writes:

In an interview in May with the IDG News Service, Tomas Ulin, Oracle vice president of MySQL engineering, described a project to bring the NoSQL-like speed of access to SQL-based MySQL.

“We feel very strongly we can combine SQL and NoSQL,” he said. “If you have really high-scalability performance requirements for certain parts of your application, you can share the dataset” across both NoSQL and SQL interfaces.

The key to Oracle’s effort is the use of Memecached, which Internet-based service providers, Facebook being the largest, have long used to quickly serve MySQL data to their users. Memcached creates a hash table of commonly accessed database items that is stored in a server’s working memory for quick access, by way of an API (application programming interface).

Memcached would provide a natural non-SQL interface for MySQL, Ulin said. Memcached “is heavily used in the Web world. It is something [webmasters] already have installed on their systems, and they know how to use [it]. So we felt that would be a good way to provide NoSQL access,” Ulin said.

Oracle’s thinking is that the Memecached interface can serve as an alternative access point for MySQL itself. Much of the putative sluggishness of SQL-based systems actually stems from the overhead of supporting a fully ACID-based query infrastructure needed to execute complex queries, industry observers said. By providing a NoSQL alternative access method, Oracle could offer customers the best of both worlds–a database that is fully ACID-compliant and has the speed of a NoSQL database.

With Memcached you are not accessing the data through SQL, but by a simple key-value lookup. “You can do a simple key-value-type lookup and get very optimal performance,” Ulin said.

The technology would not require any changes to MySQL itself. “We can just plug it in,” Ulin said. He added that Oracle was considering including this technology in the next version of MySQL, version 5.6.

While you are thinking about what that will mean for using MySQL engines, remember Stratified B-Tree and Versioned Dictionaries.

Suddenly, being able map the structures of data stores as subjects (ne topics) and to merge them, reliably, with structures of other data stores doesn’t seem all that far fetched does it? The thing to remember is that all that “big data” was stored in some “big structure,” a structure that topic maps can view as subjects to be represented by topics.

Not to mention knowing when you are accessing content (addressing) or authoring information about the content (identification).

Comments Off

September 28, 2011

Dimensions to use to compare NoSQL data stores

Filed under: Cloud Computing,NoSQL — Patrick Durusau @ 7:35 pm

Dimensions to use to compare NoSQL data stores by Huan Liu.

From the post:

You have decided to use a NoSQL data store in favor of a DBMS store, possibly due to scaling reasons. But, there are so many NoSQL stores out there, which one should you choose? Part of the NoSQL movement is the acknowledgment that there are tradeoffs, and the various NoSQL projects have pursued different tradeoff points in the design space. Understanding the tradeoffs they have made, and figuring out which one fits your application better is a major undertaking.

Obviously, choosing the right data store is a much bigger topic, which is not something that can be covered in a single blog. There are also many resources comparing the various NoSQL data stores, e.g., here, so that there is no point repeating them. Instead, in this post, I will highlight the dimensions you should use when you compare the various data stores.

Useful information to have on hand when discussing NoSQL data stores.

Comments Off

Camilstore

Filed under: Camilstore,Metadata,NoSQL,Storage — Patrick Durusau @ 7:32 pm

Camilstore

From the webpage:

Camlistore is:

a way to store, sync, share, model and back up content

a work in progress

Open Source (Apache licensed)

an acronym for “Content-Addressable Multi-Layer Indexed Storage”, hinting that Camlistore is about:

content-addressable storage

separate interoperable parts (storage, sync, sharing, modeling), with well-defined protocols and roles

your “home directory for the web”

….

If I am reading the website correctly, the project hopes to deploy a “write-only” solution.

Any estimation of data storage capacity, even in the short-term is going to sound lame when the short-term arrives. I don’t monitor the storage literature closely but even I have heard of 3-D storage and crystalline storage projects. A “write-only” future may not be that far off.

Whether it arrives or not, my concern is what role topic maps can play in systems where storage is always increasing?

One role for which topic maps seems very well suited is to host the mappings between formats, even archival formats, in which data is stored. In the twenty or so years since digital preservation has become a topic among curators and museums, there have been more metadata and storage format proposals than I an easily remember. And data has been stored in all of them, some in multiple versions of the same proposals. I don’t know of any reason to expect the turnover in storage metadata and/or formats to slow in the future.

Topic maps can store a two-way migration between metadata and storage formats so that users of older or newer software/formats are not disadvantaged with regard to stored data. That will require active maintenance of the mapping topic maps but that will be the case with any robust solution.

Comments Off

September 26, 2011

Twitter Storm: Open Source Real-time Hadoop

Filed under: Hadoop,NoSQL,Storm — Patrick Durusau @ 6:55 pm

Twitter Storm: Open Source Real-time Hadoop by Bienvenido David III.

From the post:

Twitter has open-sourced Storm, its distributed, fault-tolerant, real-time computation system, at GitHub under the Eclipse Public License 1.0. Storm is the real-time processing system developed by BackType, which is now under the Twitter umbrella. The latest package available from GitHub is Storm 0.5.2, and is mostly written in Clojure.

Storm provides a set of general primitives for doing distributed real-time computation. It can be used for “stream processing”, processing messages and updating databases in real-time. This is an alternative to managing your own cluster of queues and workers. Storm can be used for “continuous computation”, doing a continuous query on data streams and streaming out the results to users as they are computed. It can also be used for “distributed RPC”, running an expensive computation in parallel on the fly.

See the post for links, details, quotes, etc.

My bet is that typologies are going to be data set specific. You?

BTW, I don’t think the local coffee shop offers free access to its cluster. Will have to check with them next week.

Comments Off

September 25, 2011

Scaling with RavenDB

Filed under: NoSQL,RavenDB — Patrick Durusau @ 7:47 pm

Scaling with RavenDB

From the description:

Scaling the data tier is a topic that many find scary. In this webcast, Oren Eini and Nick VanMatre, Solutions Architect at Archstone, sit down to discuss the scaling options for Archstone’s newest project, a re-architecture of their internal and external apartment-management applications.

Discussed are the options for scaling RavenDB, including sharding, replication and multi-master setups.

Something to start your week!

Comments Off

September 22, 2011

Skills Matter – Autumn Update

Filed under: Conferences,Government Data,NoSQL,Scala — Patrick Durusau @ 6:26 pm

Skills Matter – Autumn Update

Given the state of UK airport security, about the only reason I would go to the UK would be for a Skills Matter (un)conference, eXchange, or tutorial! And that is from having only enjoyed them as recorded presentations, slides and code. Actual attendance must bring a lot of repeat customers.

On the schedule for this Fall:

Skills Matter Partner Conferences

Skills Matter has partnered with Silicon Valley Comes to the UK, WIP, Novoda, FuseSource and David Pollak, to provide you with the following fantastic (un)Conferences & Hackathon’s:

Silicon Valley Comes to the UK Appathon on Government Data, October 8-9th

Droidcon London on Android Development, October 6-7th

FuseSource Community Day on Apache ActiveMQ, ServiceMix and Camel, October 11th

Scala Lift-Off on Scala and Lift, October 13-14th

Skills Matter eXchanges

We’ll also be running some pretty cool one- and two-day long Skills Matter eXchanges, which are conferences featuring 45 minute long expert talks and lots of breaks to discuss what you have learned. Expect in-depth, hands-on talks led by real experts who are there to be quizzed, questioned and interrogated until you know as much as they do, or thereabouts! In the paragraphs below, you’ll be able to find out about the following eXchanges we have planned for the coming months:

NoSQL eXchange on November 2nd

Agile Testing & BDD eXchange on November 18th

Clojure eXchange on December 1st

Groovy & Grails eXchange on December 8-9th

Skills Matter Progressive Technology Tutorials

Skills Matter Progressive Technology Tutorials offer a collection of 4-hour tutorials, featuring a mix in-depth and hands-on workshops on technology, agile and software craftsmanship. In the paragraphs below, you’ll be able to find out about the following eXchanges we have planned for the coming months:

Progressive Scala Tutorials on December 2-3rd

Progressive F# Tutorials on November 3-4th

Comments Off

Introduction to RavenDB

Filed under: NoSQL,RavenDB — Patrick Durusau @ 6:22 pm

Introduction to RavenDB by Rob Ashton.

From the description:

In this session we will give a brief introduction to the concept of a document database and how it relates to what we already know before launching into a series of code demos using the RavenDB .NET Client API.

We will cover basic document structure, persistence, unit of work, querying / searching, and demonstrate real world use for map/reduce in our applications.

The usual (should I say expected?) difficulties with being unable to read the slides and/or examples in a video. Slides really should be provided with video presentations.

If you are interested in RavenDB, there are more presentations and blogs entries at Rob Ashton’s blog.

Comments Off

September 21, 2011

Cassandra Write Performance – A quick look inside

Filed under: Cassandra,NoSQL,Software — Patrick Durusau @ 7:09 pm

Cassandra Write Performance – A quick look inside

From the post:

I was looking at Cassandra, one of the major NoSQL solutions, and I was immediately impressed with its write speed even on my notebook. But I also noticed that it was very volatile in its response time, so I took a deeper look at it.

Michael Kopp uses dynaTrace to look inside Cassandra. Lots of information in between and hopefully his conclusion will make you read this posts and those he promises to follow.

Conclusion

NoSQL or BigData Solutions are very very different from your usual RDBMS, but they are still bound by the usual constraints: CPU, I/O and most importantly how it is used! Although Cassandra is lighting fast and mostly I/O bound it’s still Java and you have the usual problems – e.g. GC needs to be watched. Cassandra provides a lot of monitoring metrics that I didn’t explain here, but seeing the flow end-to-end really helps to understand whether the time is spent on the client, network or server and makes the runtime dynamics of Cassandra much clearer.

Understanding is really the key for effective usage of NoSQL solutions as we shall see in my next blogs. New problem patterns emerge and they cannot be solved by simply adding an index here or there. It really requires you to understand the usage pattern from the application point of view. The good news is that these new solutions allow us a really deep look into their inner workings, at least if you have the right tools at hand.

What tools are you using to “look inside” your topic map engine?

Comments Off

What’s new in Cassandra 1.0: Compression

Filed under: Cassandra,NoSQL — Patrick Durusau @ 7:08 pm

What’s new in Cassandra 1.0: Compression

From the post:

Cassandra 1.0 introduces support for data compression on a per-ColumnFamily basis, one of the most-requested features since the project started. Compression maximizes the storage capacity of your Cassandra nodes by reducing the volume of data on disk. In addition to the space-saving benefits, compression also reduces disk I/O, particularly for read-dominated workloads.

OK, maybe someone can help me here.

Cassandra, an Apache project, just released version 8.6. Here are the release notes for 8.6.

As a standards editor I understand being optimistic about what is “…going to appear…” in a future release but isn’t version 0.8.6 a little early to be treating features for 1.0 a bit early? (I don’t find “compression” mentioned in the cumulative release notes as of 0.8.6.)

May just be me.

Comments Off

September 20, 2011

ElasticSearch 0.17.7 Released!

Filed under: ElasticSearch,NoSQL — Patrick Durusau @ 7:52 pm

ElasticSearch 0.17.7 Released!

From the post:

This release include the usual list of bug fixes, and also include an upgrade to Lucene 3.4.0 (fixes critical bugs, so make sure you upgrade), as well as improvements to the couchdb river (memory usage wise).

Release Notes

Comments Off

September 18, 2011

Terrastore

Filed under: Javascript,MapReduce,NoSQL,Terrastore — Patrick Durusau @ 7:28 pm

Terrastore

From the webpage:

Terrastore, being based on a rock solid technology such as Terracotta, will focus more and more on advanced features and extensibility. Right now, Terrastore provides support for:

Custom data partitioning.

Event processing.

Push-down predicates.

Range queries.

Map/Reduce querying and processing.

Server-side update functions.

terrastore-0.8.2-dist.zip was just released.

This new version comes with several bug fixes and rock solid stability (at least, we hope so 😉 , other than a few important enhancements and new features such as:

Update to Terracotta 3.5.2 with performance improvements and reduced memory consumption.

Bulk operations.

Improved Javascript integration, with the possibility to dynamically load Javascript functions from files to use in server-side updates and map-reduce processing.

The Map/Reduce querying and processing is of obvious interest for topic map applications.

Comments Off

September 11, 2011

RavenDB Webinar #1

Filed under: NoSQL,RavenDB — Patrick Durusau @ 7:06 pm

RavenDB Webinar #1 was announced at: Hibernating Rhinos: Zero friction databases.

From the webpage:

The first ever RavenDB webinar aired last week, Thursday the 8th, and it was a great success. We announced it only about 12 hours in advance, yet 260+ people registered. Unfortunately the software we were using only allowed 100 people in – our apologies for all of you who wanted to participate but couldn’t get in, or heard of it too late.

CouchDB jQuery Plugin Reference

Filed under: CouchDB,JQuery,NoSQL — Patrick Durusau @ 7:04 pm

CouchDB jQuery Plugin Reference by Bradley Holt.

I’ve had a difficult time finding documentation on the CouchDB jQuery plugin that ships with CouchDB. So, I’ve decided to create my own reference and share it with you. This should cover almost the entire CouchDB API that is available through the version of the plugin that ships with CouchDB 1.1.0.

What’s your “favorite” lack of documentation?

Comments Off

September 8, 2011

Solr’s Realtime Get: Increasing its NoSQL Capabilities

Filed under: NoSQL,Searching,Solr — Patrick Durusau @ 5:55 pm

Solr’s Realtime Get: Increasing its NoSQL Capabilities

From the post:

As readers probably know, Lucene/Solr search works off of point-in-time snapshots of the index. After changes have been made to the index, a commit (or a new Near Real Time softCommit) needs to be done before those changes are visible. Even with Solr’s new NRT (Near Real Time) capabilities, it’s probably not advisable to reopen the searcher more than once a second. However there are some use cases that require the absolute latest version of a document, as opposed to just a very recent version. This is where Solr’s new realtime get comes to the rescue, where the latest version of a document can be retrieved without reopening the searcher and risk disrupting other normal search traffic.

Comments Off

September 7, 2011

Accumulo Proposal

Filed under: Accumulo,HBase,NoSQL — Patrick Durusau @ 6:49 pm

Accumulo Proposal

From the Apache incubator:

Abstract

Accumulo is a distributed key/value store that provides expressive, cell-level access labels.

Proposal

Accumulo is a sorted, distributed key/value store based on Google’s BigTable design. It is built on top of Apache Hadoop, Zookeeper, and Thrift. It features a few novel improvements on the BigTable design in the form of cell-level access labels and a server-side programming mechanism that can modify key/value pairs at various points in the data management process.

Background

Google published the design of BigTable in 2006. Several other open source projects have implemented aspects of this design including HBase, CloudStore, and Cassandra. Accumulo began its development in 2008.

Rationale

There is a need for a flexible, high performance distributed key/value store that provides expressive, fine-grained access labels. The communities we expect to be most interested in such a project are government, health care, and other industries where privacy is a concern. We have made much progress in developing this project over the past 3 years and believe both the project and the interested communities would benefit from this work being openly available and having open development.

Further explanation of access labels and iterators:

Access Labels

Accumulo has an additional portion of its key that sorts after the column qualifier and before the timestamp. It is called column visibility and enables expressive cell-level access control. Authorizations are passed with each query to control what data is returned to the user. The column visibilities are boolean AND and OR combinations of arbitrary strings (such as “(A&B)|C”) and authorizations are sets of strings (such as {C,D}).

Iterators

Accumulo has a novel server-side programming mechanism that can modify the data written to disk or returned to the user. This mechanism can be configured for any of the scopes where data is read from or written to disk. It can be used to perform joins on data within a single tablet.

The use case for modifying data written to disk is unclear to me but I suppose the data “returned to the user” involves modification of data for security reasons.

Sponsored in part by the NSA, National Security Agency of the United States.

The access label line of thinking has implications for topic map merging. What if a similar mechanism were fashioned to permit or prevent “merging” based on the access of the user? (Where merging isn’t a file based activity.)

Comments Off

September 6, 2011

Berlin Buzzwords 2011 – Slides/Videos

Filed under: Indexing,NoSQL — Patrick Durusau @ 6:59 pm

Berlin Buzzwords 2011 – Slides/Videos

I listed the slides and presentations together and sorted the listing by author. A number of very good presentations.

BTW, congratulations to the organizers of Berlin Buzzwords! Truly awesome gathering of talent.

I created this listing to assist myself in mining the presentations. Please forward any corrections. Enjoy!

Comments Off

« Newer Posts — Older Posts »

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

November 3, 2011

October 26, 2011

October 21, 2011

October 14, 2011

October 7, 2011

October 4, 2011

October 3, 2011

October 2, 2011

September 28, 2011

September 26, 2011

September 25, 2011

September 22, 2011

September 21, 2011

September 20, 2011

September 18, 2011

September 11, 2011

September 8, 2011

September 7, 2011

September 6, 2011