What the Vec? Towards Probabilistically Grounded Embeddings

05/30/2018
by   Carl Allen, et al.
2

Vector representation, or embedding, of words is commonly achieved with neural network methods, in particular word2vec (W2V). It has been shown that certain statistics of word co-occurrences are implicitly captured by properties of W2V vectors, but much remains unknown of them, e.g. any meaning of length, or more generally how it is that statistics can be reliably framed as vectors at all. By deriving a mathematical link between probabilities and vectors, we justify why W2V works and are able to create embeddings with probabilistically interpretable properties.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset