Creating a Simple Bloom Filter by Max Burstein.
From the post:
Bloom filters are super efficient data structures that allow us to tell if an object is most likely in a data set or not by checking a few bits. Bloom filters return some false positives but no false negatives. Luckily we can control the amount of false positives we receive with a trade off of time and memory.
You may have never heard of a bloom filter before but you’ve probably interacted with one at some point. For instance if you use Chrome, Chrome has a bloom filter of malicious URLs. When you visit a website it checks if that domain is in the filter. This prevents you from having to ping Google’s servers every time you visit a website to check if it’s malicious or not. Large databases such as Cassandra and Hadoop use bloom filters to see if it should do a large query or not.
I think you will appreciate the lookup performance difference. 😉
I first saw this at: Alex Popescu: Creating a Simple Bloom Filter in Python.