Near-lossless Binarization of Word Embeddings
Is it possible to learn binary word embeddings of arbitrary size from their real-value counterparts with (almost) no loss in task performance? If so, inferences performed in downstream NLP applications would benefit a massive speed-up brought by binary representations. In this paper, we derive an autoencoder architecture to learn semantic preserving binary embeddings from existing real-value ones. A binary encoder requires an element-wise function that outputs a bit given a real value - it is unfortunately highly non-differentiable. We propose to use the same parameters matrix for the encoder and the decoder so that we are able to learn weights using the decoder. We also show that it is possible and desirable to minimize the correlation between the different binary features at training time through ad hoc regularization. The learned binary codes can be of arbitrary sizes, and the binary representations yield (almost) the same performances than their real-value counterparts. Finally, we show that we are able to recon- struct semantic-preserving real-value embeddings from the binary embeddings - the cherry on top.
READ FULL TEXT