How Generalized Language Models outperform Modified Kneser Ney Smoothing by a Perplexity drop of up to 25% by René Pickhardt.
René reports on the core of his dissertation work.
From the post:
When you want to assign a probability to a sequence of words you will run into the Problem that longer sequences are very rare. People fight this problem by using smoothing techniques and interpolating longer order models (models with longer word sequences) with lower order language models. While this idea is strong and helpful it is usually applied in the same way. In order to use a shorter model the first word of the sequence is omitted. This will be iterated. The Problem occurs if one of the last words of the sequence is the really rare word. In this way omiting words in the front will not help.
So the simple trick of Generalized Language models is to smooth a sequence of n words with n-1 shorter models which skip a word at position 1 to n-1 respectively.
Then we combine everything with Modified Kneser Ney Smoothing just like it was done with the previous smoothing methods.
Unlike some white papers, webinars and demos, you don’t have to register, list your email and phone number, etc. to see both the test data and code that implements René’s ideas.
Please send René useful feedback as a way to say thank you for sharing both data and code.