NoSQL is Great, But You Still Need Indexes by Martin Farach-Colton.
From the post:
I’ve said it before, and, as is the nature of these things, I’ll almost certainly say it again: your database performance is only as good as your indexes.
That’s the grand thesis, so what does that mean? In any DB system — SQL, NoSQL, NewSQL, PostSQL, … — data gets ingested and organized. And the system answers queries. The pain point for most users is around the speed to answer queries. And the query speed (both latency and throughput, to be exact) depend on how the data is organized. In short: Good Indexes, Fast Queries; Poor Indexes, Slow Queries.
But building indexes is hard work, or at least it has been for the last several decades, because almost all indexing is done with B-trees. That’s true of commercial databases, of MySQL, and of most NoSQL solutions that do indexing. (The ones that don’t do indexing solve a very different problem and probably shouldn’t be confused with databases.)
It’s not true of TokuDB. We build Fractal Tree Indexes, which are much easier to maintain but can still answer queries quickly. So with TokuDB, it’s Fast Indexes, More Indexes, Fast Queries. TokuDB is usually thought of as a storage engine for MySQL and MariaDB. But it’s really a B-tree substitute, so we’re always on the lookout for systems where we can improving the indexing.
Enter MongoDB. MongoDB is beloved because it makes deployment fast. But when you peel away the layers, you get down to a B-tree, with all the performance headaches and workarounds that they necessitate.
That’s the theory, anyway. So we did some testing. We ripped out the part of MongoDB that takes care of secondary indices and plugged in TokuDB. We’ve posted the blogs before, but here they are again, the greatest hits of TokuDB+MongoDB: we show a 10x insertion performance, a 268x query performance, and a 532x (or 53,200% if you prefer) multikey index insertion performance. We also discussed covered indexes vs. clustered Fractal Tree Indexes.
Did somebody declare February 20th to be performance release day?
Did I miss that memo? 😉
Like every geek, I like faster. But, here’s my question:
Have there been any studies on the impact of faster systems on searching and decision making by users?
My assumption is the faster I get a non-responsive result from a search, the sooner I can improve it.
But that’s an assumption on my part.
Is that really true?