N-gram and Neural Language Models for Discriminating Similar Languages
This paper describes our submission (named clac) to the 2016 Discriminating Similar Languages (DSL) shared task. We participated in the closed Sub-task 1 (Set A) with two separate machine learning techniques. The first approach is a character based Convolution Neural Network with a bidirectional long short term memory (BiLSTM) layer (CLSTM), which achieved an accuracy of 78.45 minimal tuning. The second approach is a character-based n-gram model. This last approach achieved an accuracy of 88.45 89.38
READ FULL TEXT