Adaptive Ranking-based Sample Selection for Weakly Supervised Class-imbalanced Text Classification

by   Linxin Song, et al.

To obtain a large amount of training labels inexpensively, researchers have recently adopted the weak supervision (WS) paradigm, which leverages labeling rules to synthesize training labels rather than using individual annotations to achieve competitive results for natural language processing (NLP) tasks. However, data imbalance is often overlooked in applying the WS paradigm, despite being a common issue in a variety of NLP tasks. To address this challenge, we propose Adaptive Ranking-based Sample Selection (ARS2), a model-agnostic framework to alleviate the data imbalance issue in the WS paradigm. Specifically, it calculates a probabilistic margin score based on the output of the current model to measure and rank the cleanliness of each data point. Then, the ranked data are sampled based on both class-wise and rule-aware ranking. In particular, the two sample strategies corresponds to our motivations: (1) to train the model with balanced data batches to reduce the data imbalance issue and (2) to exploit the expertise of each labeling rule for collecting clean samples. Experiments on four text classification datasets with four different imbalance ratios show that ARS2 outperformed the state-of-the-art imbalanced learning and WS methods, leading to a 2 improvement on their F1-score.


A Survey of Methods for Addressing Class Imbalance in Deep-Learning Based Natural Language Processing

Many natural language processing (NLP) tasks are naturally imbalanced, a...

Rank over Class: The Untapped Potential of Ranking in Natural Language Processing

Text classification has long been a staple in natural language processin...

Fairness-aware Class Imbalanced Learning

Class imbalance is a common challenge in many NLP tasks, and has clear c...

Dice Loss for Data-imbalanced NLP Tasks

Many NLP tasks such as tagging and machine reading comprehension are fac...

One Class One Click: Quasi Scene-level Weakly Supervised Point Cloud Semantic Segmentation with Active Learning

Reliance on vast annotations to achieve leading performance severely res...

Par4Sim -- Adaptive Paraphrasing for Text Simplification

Learning from a real-world data stream and continuously updating the mod...

A Few-shot Approach to Resume Information Extraction via Prompts

Prompt learning has been shown to achieve near-Fine-tune performance in ...

Please sign up or login with your details

Forgot password? Click here to reset