How Do Your Biomedical Named Entity Models Generalize to Novel Entities?
The number of biomedical literature on new biomedical concepts is rapidly increasing, which necessitates a reliable biomedical named entity recognition (BioNER) model for identifying new and unseen entity mentions. However, it is questionable whether existing BioNER models can effectively handle them. In this work, we systematically analyze the three types of recognition abilities of BioNER models: memorization, synonym generalization, and concept generalization. We find that (1) BioNER models are overestimated in terms of their generalization ability, and (2) they tend to exploit dataset biases, which hinders the models' abilities to generalize. To enhance the generalizability, we present a simple debiasing method based on the data statistics. Our method consistently improves the generalizability of the state-of-the-art (SOTA) models on five benchmark datasets, allowing them to better perform on unseen entity mentions.
READ FULL TEXT