Constructing Artificial Data for Fine-tuning for Low-Resource Biomedical Text Tagging with Applications in PICO Annotation

by   Gaurav Singh, et al.

Biomedical text tagging systems are plagued by the dearth of labeled training data. There have been recent attempts at using pre-trained encoders to deal with this issue. Pre-trained encoder provides representation of the input text which is then fed to task-specific layers for classification. The entire network is fine-tuned on the labeled data from the target task. Unfortunately, a low-resource biomedical task often has too few labeled instances for satisfactory fine-tuning. Also, if the label space is large, it contains few or no labeled instances for majority of the labels. Most biomedical tagging systems treat labels as indexes, ignoring the fact that these labels are often concepts expressed in natural language e.g. `Appearance of lesion on brain imaging'. To address these issues, we propose constructing extra labeled instances using label-text (i.e. label's name) as input for the corresponding label-index (i.e. label's index). In fact, we propose a number of strategies for manufacturing multiple artificial labeled instances from a single label. The network is then fine-tuned on a combination of real and these newly constructed artificial labeled instances. We evaluate the proposed approach on an important low-resource biomedical task called PICO annotation, which requires tagging raw text describing clinical trials with labels corresponding to different aspects of the trial i.e. PICO (Population, Intervention/Control, Outcome) characteristics of the trial. Our empirical results show that the proposed method achieves a new state-of-the-art performance for PICO annotation with very significant improvements over competitive baselines.


page 1

page 2

page 3

page 4


Prompt-based Text Entailment for Low-Resource Named Entity Recognition

Pre-trained Language Models (PLMs) have been applied in NLP tasks and ac...

Morphosyntactic Tagging with Pre-trained Language Models for Arabic and its Dialects

We present state-of-the-art results on morphosyntactic tagging across di...

Fine-tuning Pretrained Language Models with Label Attention for Explainable Biomedical Text Classification

The massive growth of digital biomedical data is making biomedical text ...

Low-Resource Multi-Granularity Academic Function Recognition Based on Multiple Prompt Knowledge

Fine-tuning pre-trained language models (PLMs), e.g., SciBERT, generally...

Graph-Based Multilingual Label Propagation for Low-Resource Part-of-Speech Tagging

Part-of-Speech (POS) tagging is an important component of the NLP pipeli...

Joint Learning of Pre-Trained and Random Units for Domain Adaptation in Part-of-Speech Tagging

Fine-tuning neural networks is widely used to transfer valuable knowledg...

Can NLI Provide Proper Indirect Supervision for Low-resource Biomedical Relation Extraction?

Two key obstacles in biomedical relation extraction (RE) are the scarcit...

Please sign up or login with your details

Forgot password? Click here to reset