Unleashing the Potential of Vision-Language Models for Long-Tailed Visual Recognition

Teli Ma (Shanghai Artificial Intelligence Laboratory),* Shijie Geng (Rutgers University), Mengmeng Wang (Zhejiang University), Sheng Xu (Beihang University), Hongsheng Li (The Chinese University of Hong Kong), Baochang Zhang (Beihang University), Peng Gao (Chinese university of hong kong), Yu Qiao (Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences)
The 33rd British Machine Vision Conference


The visual world naturally exhibits a long-tailed distribution of open classes, which poses great challenges to modern visual systems. Existing approaches either perform class re-balancing strategies or model ensembling based on image modality. In this paper, we explore strategies of leveraging large-scale pretrained vision-language models for visual long-tailed recognition inspired by the success of powerful multimodal representations that are promising to handle data deficiency and unseen concepts. We first introduce a BALLAD method to finetune vision-language models, transferring open-vocabulary knowledge into long-tailed domain dataset in a contrastive manner. Moreover, we propose a non-contrastive and non-parametric learning strategy named TACKLE to transfer conceptual knowledge from visual-linguistic model parameters into generated images to balance the training of visual representations. Extensive experiments have been conducted on three popular long-tailed recognition benchmarks to demonstrate the effectiveness of proposed methods.



author    = {Teli Ma and Shijie Geng and Mengmeng Wang and Sheng Xu and Hongsheng Li and Baochang Zhang and Peng Gao and Yu Qiao},
title     = {Unleashing the Potential of Vision-Language Models for Long-Tailed Visual Recognition},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {https://bmvc2022.mpi-inf.mpg.de/0481.pdf}

Copyright © 2022 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection