If you don’t mind alpha code, ålenkå was pointed out in the bitmap posting I cited earlier today.
From its homepage:
Alenka is a modern analytical database engine written to take advantage of vector based processing and high bandwidth of modern GPUs.
Features include:
Vector-based processing
CUDA programming model allows a single operation to be applied to an entire set of data at once.Self optimizing compression
Ultra fast compression and decompression performed directly inside GPUColumn-based storage
Minimize disk I/O by only accessing the relevant dataFast database loads
Data load times measured in minutes, not in hours.Open source and free
Apologies for the name spelling differences, Ålenkå versus Alenka. I suspect it has something to do with character support in whatever produced the readme file, but can’t say for sure.
The benchmarks (there is that term again) are impressive.
Would semantic benchmarks be different from the ones used in IR currently? Different from precision and recall? What about range (same subject but identified differently) or accuracy (different identifications but same subject, how many false positives)?