Disentangling Content and Motion for Text-Based Neural Video Manipulation

Levent Karacan (Iskenderun Technical University),* Tolga Kerimoğlu (Boğaziçi University), İsmail Ata İnan (Boğaziçi University), Tolga Birdal (Imperial College London), Erkut Erdem (Hacettepe University), Aykut Erdem (Koc University)

The 33^rd British Machine Vision Conference

Abstract

Giving machines the ability to imagine possible new objects or scenes from linguistic descriptions and produce their realistic renderings is arguably one of the most challenging problems in computer vision. Recent advances in deep generative models have led to new approaches that give promising results towards this goal. In this paper, we introduce a new method called DiCoMoGAN for manipulating videos with natural language, aiming to perform local and semantic edits on a video clip to alter the appearances of an object of interest. Our GAN architecture allows for better utilization of multiple observations by disentangling content and motion to enable controllable semantic edits. To this end, we introduce two tightly coupled networks: (i) a representation network for constructing a concise understanding of motion dynamics and temporally invariant content, and (ii) a translation network that exploits the extracted latent content representation to actuate the manipulation according to the target description. Our qualitative and quantitative evaluations demonstrate that DiCoMoGAN significantly outperforms existing frame-based methods, producing temporally coherent and semantically more meaningful results.

Citation

@inproceedings{Karacan_2022_BMVC,
author    = {Levent Karacan and Tolga  Kerimoğlu and İsmail Ata İnan and Tolga Birdal and Erkut Erdem and Aykut Erdem},
title     = {Disentangling Content and Motion for Text-Based Neural Video Manipulation},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {https://bmvc2022.mpi-inf.mpg.de/0443.pdf}
}

Copyright © 2022 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection