I suspect this question:
I’ve heard people mention the “hashing trick” in machine learning, particularly with regards to machine learning on large data.
What is this trick, and what is it used for? Is it similar to the use of random projections?
(Yes, I know that there’s a brief page about it here. I guess I’m looking for an overview that might be more helpful than reading a bunch of papers.)
comes up fairly often. The answer given is unusually helpful so I wanted to point it out here.