Discriminative Class Tokens for Text-to-Image Diffusion Models

by   Idan Schwartz, et al.

Recent advances in text-to-image diffusion models have enabled the generation of diverse and high-quality images. However, generated images often fall short of depicting subtle details and are susceptible to errors due to ambiguity in the input text. One way of alleviating these issues is to train diffusion models on class-labeled datasets. This comes with a downside, doing so limits their expressive power: (i) supervised datasets are generally small compared to large-scale scraped text-image datasets on which text-to-image models are trained, and so the quality and diversity of generated images are severely affected, or (ii) the input is a hard-coded label, as opposed to free-form text, which limits the control over the generated images. In this work, we propose a non-invasive fine-tuning technique that capitalizes on the expressive potential of free-form text while achieving high accuracy through discriminative signals from a pretrained classifier, which guides the generation. This is done by iteratively modifying the embedding of a single input token of a text-to-image diffusion model, using the classifier, by steering generated images toward a given target class. Our method is fast compared to prior fine-tuning methods and does not require a collection of in-class images or retraining of a noise-tolerant classifier. We evaluate our method extensively, showing that the generated images are: (i) more accurate and of higher quality than standard diffusion models, (ii) can be used to augment training data in a low-resource setting, and (iii) reveal information about the data used to train the guiding classifier. The code is available at <https://github.com/idansc/discriminative_class_tokens>


page 18

page 21

page 23

page 25

page 26

page 27

page 29

page 30


GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Diffusion models have recently been shown to generate high-quality synth...

Diffusion models for Handwriting Generation

In this paper, we propose a diffusion probabilistic model for handwritin...

DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models

Despite the ability of existing large-scale text-to-image (T2I) models t...

Not with my name! Inferring artists' names of input strings employed by Diffusion Models

Diffusion Models (DM) are highly effective at generating realistic, high...

BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion

Recent text-to-image diffusion models have demonstrated an astonishing c...

TeTIm-Eval: a novel curated evaluation data set for comparing text-to-image models

Evaluating and comparing text-to-image models is a challenging problem. ...

Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models

Text-to-Image diffusion models have made tremendous progress over the pa...

Please sign up or login with your details

Forgot password? Click here to reset