A Hidden Markov Model Based System for Entity Extraction from Social Media English Text at FIRE 2015
This paper presents the experiments carried out by us at Jadavpur University as part of the participation in FIRE 2015 task: Entity Extraction from Social Media Text - Indian Languages (ESM-IL). The tool that we have developed for the task is based on Trigram Hidden Markov Model that utilizes information like gazetteer list, POS tag and some other word level features to enhance the observation probabilities of the known tokens as well as unknown tokens. We submitted runs for English only. A statistical HMM (Hidden Markov Models) based model has been used to implement our system. The system has been trained and tested on the datasets released for FIRE 2015 task: Entity Extraction from Social Media Text - Indian Languages (ESM-IL). Our system is the best performer for English language and it obtains precision, recall and F-measures of 61.96, 39.46 and 48.21 respectively.
READ FULL TEXT