A POS Tagger for Code Mixed Indian Social Media Text - ICON-2016 NLP Tools Contest Entry from Surukam

12/31/2016
by   Sree Harsha Ramesh, et al.
0

Building Part-of-Speech (POS) taggers for code-mixed Indian languages is a particularly challenging problem in computational linguistics due to a dearth of accurately annotated training corpora. ICON, as part of its NLP tools contest has organized this challenge as a shared task for the second consecutive year to improve the state-of-the-art. This paper describes the POS tagger built at Surukam to predict the coarse-grained and fine-grained POS tags for three language pairs - Bengali-English, Telugu-English and Hindi-English, with the text spanning three popular social media platforms - Facebook, WhatsApp and Twitter. We employed Conditional Random Fields as the sequence tagging algorithm and used a library called sklearn-crfsuite - a thin wrapper around CRFsuite for training our model. Among the features we used include - character n-grams, language information and patterns for emoji, number, punctuation and web-address. Our submissions in the constrained environment,i.e., without making any use of monolingual POS taggers or the like, obtained an overall average F1-score of 76.45 the 2015 winning score of 76.79

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/23/2016

A CRF Based POS Tagger for Code-mixed Indian Social Media Text

In this work, we describe a conditional random fields (CRF) based system...
research
02/01/2017

SMPOST: Parts of Speech Tagger for Code-Mixed Indic Social Media Text

Use of social media has grown dramatically during the last few years. Us...
research
10/31/2016

Experiments with POS Tagging Code-mixed Indian Social Media Text

This paper presents Centre for Development of Advanced Computing Mumbai'...
research
02/06/2022

How Effective is Incongruity? Implications for Code-mix Sarcasm Detection

The presence of sarcasm in conversational systems and social media like ...
research
08/21/2019

Predict Emoji Combination with Retrieval Strategy

As emojis are widely used in social media, people not only use an emoji ...
research
01/06/2016

Part-of-Speech Tagging for Code-mixed Indian Social Media Text at ICON 2015

This paper discusses the experiments carried out by us at Jadavpur Unive...
research
08/31/2016

Demographic Dialectal Variation in Social Media: A Case Study of African-American English

Though dialectal language is increasingly abundant on social media, few ...

Please sign up or login with your details

Forgot password? Click here to reset