On Temporal Granularity in Self-Supervised Video Representation Learning

Rui Qian (Cornell University), Yeqing Li (Google), Liangzhe Yuan (Google Research), Boqing Gong (Google), Ting Liu (Google Research), Matthew Brown (Google), Serge Belongie (University of Copenhagen), Ming-Hsuan Yang (Google Research), Hartwig Adam (Google), Yin Cui (Google)*
The 33rd British Machine Vision Conference


This work presents an empirical exploration of temporal granularity in self-supervised video representation learning. While state-of-the-art methods commonly enforce the learned features to be temporally-persistent across the whole video, we argue that this objective may not be suitable for all video tasks. To reveal the impact of temporal granularity, we propose a simple unified framework to learn features from same unlabeled videos with varying granularities from temporally fine-grained to persistent, by only adjusting one coefficient. We conduct a comprehensive empirical study covering a variety of classic and emerging video benchmarks and find video-level understanding tasks prefer temporally persistent features while temporal understanding inside one video favors fine-grained features. The flexibility of our framework gives rise to competitive or state-of-the-art performance, even outperforming supervised pre-training in a few cases. Code will be available at https://github.com/tensorflow/models/tree/master/official/.



author    = {Rui Qian and Yeqing Li and Liangzhe Yuan and Boqing Gong and Ting Liu and Matthew Brown and Serge Belongie and Ming-Hsuan Yang and Hartwig Adam and Yin Cui},
title     = {On Temporal Granularity in Self-Supervised Video Representation Learning},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {https://bmvc2022.mpi-inf.mpg.de/0541.pdf}

Copyright © 2022 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection