Lexical Bias In Essay Level Prediction
Automatically predicting the level of non-native English speakers given their written essays is an interesting machine learning problem. In this work I present the system "balikasg" that achieved the state-of-the-art performance in the CAp 2018 data science challenge among 14 systems. I detail the feature extraction, feature engineering and model selection steps and I evaluate how these decisions impact the system's performance. The paper concludes with remarks for future work.
READ FULL TEXT