Personalised CLIP or: how to find your vacation videos


Bruno Korbar (University of Oxford),* Andrew Zisserman (University of Oxford)
The 33rd British Machine Vision Conference

Abstract

In this paper, our goal is a person-centric model capable of retrieving the image or video corresponding to a personalized compound query from a large set of images or videos. Specifically, given a query consisting of an image of a person's \textit{face} and a text \textit{scene description} or \textit{action description}, we retrieve images or video-clips corresponding to this compound query. We make three contributions: (1) we propose~\model, a model that is able to retrieve images/video given a personalized compound-query. We achieve this by building on a pre-trained CLIP vision-text model that has compound, but general, query capabilities, and provide a mechanism to personalize it to the target person specified by their face; (2) we share a new {\em Celebrities in Action} (\dset) dataset of movies with automatically generated annotations for identities, locations, and actions that can be used for evaluation of the compound-retrieval task; (3) we evaluate our model's performance on two datasets: Celebrities in Places for compound queries of a celebrity and a scene description; and our new \dset\ for compound queries of a celebrity and an action description. We demonstrate the flexibility of the model with free-form queries and compare to previous methods.

Video



Citation

@inproceedings{Korbar_2022_BMVC,
author    = {Bruno Korbar and Andrew Zisserman},
title     = {Personalised CLIP or: how to find your vacation videos},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {https://bmvc2022.mpi-inf.mpg.de/0639.pdf}
}


Copyright © 2022 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection