While large-scale pre-trained text-to-image models can synthesize divers...
Diffusion models are able to generate photorealistic images in arbitrary...
Scene text spotting is of great importance to the computer vision commun...
The attention-based encoder-decoder framework is becoming popular in sce...
In this paper, we abandon the dominant complex language model and rethin...
Existing scene text removal methods mainly train an elaborate network wi...
Linguistic knowledge is of great benefit to scene text recognition. Howe...