Including Dialects and Language Varieties in Author Profiling
This paper presents a computational approach to author profiling taking gender and language variety into account. We apply an ensemble system with the output of multiple linear SVM classifiers trained on character and word n-grams. We evaluate the system using the dataset provided by the organizers of the 2017 PAN lab on author profiling. Our approach achieved 75 accuracy on gender identification on tweets written in four languages and 97 accuracy on language variety identification for Portuguese.
READ FULL TEXT