Turbo Training with Token Dropout

Tengda Han (University of Oxford),* Weidi Xie (Shanghai Jiao Tong University), Andrew Zisserman (University of Oxford)
The 33rd British Machine Vision Conference


The objective of this paper is an efficient training method for video tasks. We make three contributions: (1) We propose Turbo training, a simple and versatile training paradigm for Transformers on multiple video tasks. (2) We illustrate the advantages of Turbo training on action classification, video-language representation learning, and long-video activity classification, showing that Turbo training can largely maintain competitive per- formance while achieving almost 4× speed-up and significantly less memory consump- tion. (3) Turbo training enables long-schedule video-language training and end-to-end long-video training, delivering competitive or superior performance than previous works, which were infeasible to train under limited resources.



