Lexical Bias In Essay Level Prediction


Abstract in English

Automatically predicting the level of non-native English speakers given their written essays is an interesting machine learning problem. In this work I present the system balikasg that achieved the state-of-the-art performance in the CAp 2018 data science challenge among 14 systems. I detail the feature extraction, feature engineering and model selection steps and I evaluate how these decisions impact the systems performance. The paper concludes with remarks for future work.

Download