Hash Table Performance in R: Part I + Part 2 by Jeffrey Horner.
From part 1:
A hash table, or associative array, is a well known key-value data structure. In R there is no equivalent, but you do have some options. You can use a vector of any type, a list, or an environment.
But as you’ll see with all of these options their performance is compromised in some way. In the average case a lookupash tabl for a key should perform in constant time, or O(1), while in the worst case it will perform in O(n) time, n being the number of elements in the hash table.
For the tests below, we’ll implement a hash table with a few R data structures and make some comparisons. We’ll create hash tables with only unique keys and then perform a search for every key in the table.
…
This rocks! Talk about performance increases!
My current Twitter client doesn’t dedupe my home feed and certainly doesn’t dedupe it against search based feeds. I’m not so concerned with retweets as with authors that repeat the same tweet several times in a row. What I don’t know is what period of uniqueness would be best? Will have to experiment with that.
I originally saw this series at Hash Table Performance in R: Part II In Part I of this series, I explained how R hashed… on R-Bloggers, the source of so much excellent R related content.