Traditional multitask learning methods basically can only exploit common...
Diffusion models developed on top of powerful text-to-image generation m...
People perceive the world with multiple senses (e.g., through hearing so...
Chinese BERT models achieve remarkable progress in dealing with grammati...
We present a Chinese BERT model dubbed MarkBERT that uses word informati...
Whole word masking (WWM), which masks all subwords corresponding to a wo...
While GPT has become the de-facto method for text generation tasks, its
...
The standard BERT adopts subword-based tokenization, which may break a w...
Pre-trained models for programming language have achieved dramatic empir...
Verifying the correctness of a textual statement requires not only seman...
We present CodeBERT, a bimodal pre-trained model for programming languag...