Use case-focused metrics to evaluate machine learning for diseases involving parasite loads
Communal hill-climbing, via comparison of algorithm performances, can greatly accelerate ML research. However, it requires task-relevant metrics. For diseases involving parasite loads, e.g., malaria and neglected tropical diseases (NTDs) such as schistosomiasis, the metrics currently reported in ML papers (e.g., AUC, F1 score) are ill-suited to the clinical task. As a result, the hill-climbing system is not enabling progress towards solutions that address these dire illnesses. Drawing on examples from malaria and NTDs, this paper highlights two gaps in current ML practice and proposes methods for improvement: (i) We describe aspects of ML development, and performance metrics in particular, that need to be firmly grounded in the clinical use case, and we offer methods for acquiring this domain knowledge. (ii) We describe in detail performance metrics to guide development of ML models for diseases involving parasite loads. We highlight the importance of a patient-level perspective, interpatient variability, false positive rates, limit of detection, and different types of error. We also discuss problems with ROC curves and AUC as commonly used in this context.
READ FULL TEXT