Improved Language Modeling by Decoding the Past

08/14/2018
by   Siddhartha Brahma, et al.
0

Highly regularized LSTMs that model the auto-regressive conditional factorization of the joint probability distribution of words achieve state-of-the-art results in language modeling. These models have an implicit bias towards predicting the next word from a given context. We propose a new regularization term based on decoding words in the context from the predicted distribution of the next word. With relatively few additional parameters, our model achieves absolute improvements of 1.7% and 2.3% over the current state-of-the-art results on the Penn Treebank and WikiText-2 datasets.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset