Image-to-Image Translation with Text Guidance


Bowen Li (University of Oxford),* Philip Torr (University of Oxford), Thomas Lukasiewicz (University of Oxford)
The 33rd British Machine Vision Conference

Abstract

In this paper, we focus on image-to-image translation with text guidance, where a text description is used to control visual attributes of the synthetic image produced from a given semantic mask. To accomplish this task, we propose a new multi-stage generative adversarial network with three novel components: (1) a discriminator with dual-directional feedback, which provides the generator at the same stage with fine-grained supervisory feedback related to image regions, encouraging it to produce realistic images with finer regional details, and also facilitating generators at following stages to have the ability to complete missing contents and correct inappropriate visual attributes, (2) a compatibility loss guides generators to produce both realistic objects and the background, and also to achieve a good compatibility between them, and (3) a part-of-speech tagging-based spatial attention to better build connection between image regions and corresponding semantic words. Experimental results demonstrate that our model can effectively control the image translation using text descriptions. More importantly, the text input allows our model to produce much diverse results and even new synthetic images that are out-of-distribution of the dataset.

Video



Citation

@inproceedings{Li_2022_BMVC,
author    = {Bowen Li and Philip Torr and Thomas Lukasiewicz},
title     = {Image-to-Image Translation with Text Guidance},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {https://bmvc2022.mpi-inf.mpg.de/0581.pdf}
}


Copyright © 2022 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection