Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 7, 2014

Lgram

Filed under: Linguistics,N-Grams — Patrick Durusau @ 6:51 pm

Lgram: A memory-efficient ngram builder

From the webpage:

Lgram is a cross–platform tool for calculating ngrams in a memory–efficient manner. The current crop of n-gram tools have non–constant memory usage such that ngrams cannot be computed for large input texts. Given the prevalence of large texts in computational and corpus linguistics, this deficit is problematic. Lgram has constant memory usage so it can compute ngrams on arbitrarily sized input texts. Lgram achieves constant memory usages by periodically syncing the computed ngrams to an sqlite database stored on disk.

Lgram was written by Edward J. L. Bell at Lancaster University and funded by UCREL. The project was initiated by Dr Paul Rayson.

Not recent (2011) but new to me. Enjoy!

I first saw this in a tweet by Christopher Phipps.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress