Soft-prompt tuning to predict lung cancer using primary care free-text Dutch medical notes

03/28/2023
by   Auke Elfrink, et al.
0

We investigate different natural language processing (NLP) approaches based on contextualised word representations for the problem of early prediction of lung cancer using free-text patient medical notes of Dutch primary care physicians. Because lung cancer has a low prevalence in primary care, we also address the problem of classification under highly imbalanced classes. Specifically, we use large Transformer-based pretrained language models (PLMs) and investigate: 1) how soft prompt-tuning – an NLP technique used to adapt PLMs using small amounts of training data – compares to standard model fine-tuning; 2) whether simpler static word embedding models (WEMs) can be more robust compared to PLMs in highly imbalanced settings; and 3) how models fare when trained on notes from a small number of patients. We find that 1) soft-prompt tuning is an efficient alternative to standard model fine-tuning; 2) PLMs show better discrimination but worse calibration compared to simpler static word embedding models as the classification problem becomes more imbalanced; and 3) results when training models on small number of patients are mixed and show no clear differences between PLMs and WEMs. All our code is available open source in <https://bitbucket.org/aumc-kik/prompt_tuning_cancer_prediction/>.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset