Beyond Word Embeddings: Learning Entity and Concept Representations from Large Scale Knowledge Bases
Text representation using neural word embeddings has proven efficacy in many NLP applications. Recently, a lot of research interest goes beyond word embeddings by adapting the traditional word embedding models to learn vectors of multiword expressions (concepts/entities). However, current methods are limited to textual knowledge bases only (e.g., Wikipedia). In this paper, we propose a novel approach for learning concept vectors from two large scale knowledge bases (Wikipedia, and Probase). We adapt the skip-gram model to seamlessly learn from the knowledge in Wikipedia text and Probase concept graph. We evaluate our concept embedding models intrinsically on two tasks: 1) analogical reasoning where we achieve a state-of-the-art performance of 91 semantic analogies, 2) concept categorization where we achieve a state-of-the-art performance on two benchmark datasets achieving categorization accuracy of 100 study to extrinsically evaluate our model on unsupervised argument type identification for neural semantic parsing. We demonstrate the competitive accuracy of our unsupervised method and its ability to better generalize to out of vocabulary entity mentions compared to the tedious and error prone methods which depend on gazetteers and regular expressions.
READ FULL TEXT