Khmer Text Classification Using Word Embedding and Neural Networks

12/13/2021
by   Rina Buoy, et al.
0

Text classification is one of the fundamental tasks in natural language processing to label an open-ended text and is useful for various applications such as sentiment analysis. In this paper, we discuss various classification approaches for Khmer text, ranging from a classical TF-IDF algorithm with support vector machine classifier to modern word embedding-based neural network classifiers including linear layer model, recurrent neural network and convolutional neural network. A Khmer word embedding model is trained on a 30-million-Khmer-word corpus to construct word vector representations that are used to train three different neural network classifiers. We evaluate the performance of different approaches on a news article dataset for both multi-class and multi-label text classification tasks. The result suggests that neural network classifiers using a word embedding model consistently outperform the traditional classifier using TF-IDF. The recurrent neural network classifier provides a slightly better result compared to the convolutional network and the linear layer network.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2023

Text classification dataset and analysis for Uzbek language

Text classification is an important task in Natural Language Processing ...
research
09/07/2018

Convolutional Neural Network: Text Classification Model for Open Domain Question Answering System

Recently machine learning is being applied to almost every data domain o...
research
09/12/2022

emojiSpace: Spatial Representation of Emojis

In the absence of nonverbal cues during messaging communication, users e...
research
10/19/2022

Machine and Deep Learning Methods with Manual and Automatic Labelling for News Classification in Bangla Language

Research in Natural Language Processing (NLP) has increasingly become im...
research
09/28/2018

Learning Robust, Transferable Sentence Representations for Text Classification

Despite deep recurrent neural networks (RNNs) demonstrate strong perform...
research
09/12/2020

Relation Detection for Indonesian Language using Deep Neural Network – Support Vector Machine

Relation Detection is a task to determine whether two entities are relat...
research
04/29/2021

Recognition and Processing of NATOM

In this paper we show how to process the NOTAM (Notice to Airmen) data o...

Please sign up or login with your details

Forgot password? Click here to reset