Teaching StyleGAN to Read: Improving Text-to-image Synthesis with U2C Transfer Learning


Vinicius G Pereira (Puc Rio), JONATAS WEHRMANN (PUCRS)*
The 33rd British Machine Vision Conference

Abstract

Generative Adversarial Networks (GANs) are unsupervised models that can learn from an indefinitely large amount of images. On the other hand, models that generate images from language queries depend on high-quality labeled data that is scarce. Transfer learning is a known technique that alleviates the need for labeled data, though it is not trivial to turn an unconditional generative model into a text-conditioned one. This work proposes a simple, yet effective finetuning approach, called Unconditional-to-Conditional Transfer Learning (U2C transfer). It can leverage well-established pre-trained models while learning to respect the given textual condition conditions. We evaluate U2C transfer efficiency by finetuning StyleGAN2 in two of the most widely used text-to-images data sources, generating the Text-Conditioned StyleGAN2 (TC-StyleGAN2). Our models quickly achieved state-of-the-art results in the CUB-200 and Oxford-102 datasets, with FID values of 7.49 and 9.47 respectively. These values represent respective relative gains of 7% and 68% when compared to prior work. We show that our method is capable of learning fine-grained details from text queries while producing photorealistic and detailed images. Finally, we show that the models structure the intermediate space in a semantically meaningful fashion.

Video



Citation

@inproceedings{Pereira_2022_BMVC,
author    = {Vinicius G Pereira and JONATAS WEHRMANN},
title     = {Teaching StyleGAN to Read: Improving Text-to-image Synthesis with U2C Transfer Learning},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {https://bmvc2022.mpi-inf.mpg.de/0512.pdf}
}


Copyright © 2022 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection