A Machine Learning Approach for Detecting Students at Risk of Low Academic Achievement
We aim to predict whether a primary school student will perform in the `below standard' band of a standardized test based on a set of individual, school-level, and family-level observables. We exploit a data set containing test performance on the National Assessment Program - Literacy and Numeracy (NAPLAN); a test given annually to all Australian primary school students in grades 3, 5, 7, and 9. Students who perform in the `below standard' band constitute approximately 3 above standard, requiring that a proposed classifier be robust to imbalanced classes. Observations for students in grades 5, 7, and 9 contain data on previous achievement in NAPLAN. We separate the analysis into students in grade 5 and above, for which previous achievement may be used as a predictor; and students in grade 3, which must rely on family and school-level predictors only. On each subset of the data, we train and compare a set of classifiers in order to predict below standard performance in reading and numeracy learning areas respectively. The best classifiers for grades 5 and above achieve an area under the ROC curve of approximately 95 approximately 80 screen a large number of students for their risk of obtaining below standard achievement a full two years before they are identified as achieving below standard on their next NAPLAN test.
READ FULL TEXT