SeA: Selective Attention for Fine-grained Visual Categorization


yajie Chen (Nanjing University of Science and Technology),* Huan Wang (Nanjing University of Science & Technology), PAN PEIWEN (Nanjing University of Science and Technology)
The 33rd British Machine Vision Conference

Abstract

Fine-grained recognition intends to distinguish objects with similar visual signals and has been a challenging problem in computer vision. Visual transformers (ViTs) have recently led the trends of visual representations by a global self-attention mechanism, and exhibited their potential in fine-grained related tasks. Yet, we find that the common ViTs focus on all patches and aggregate spatial features using shift operation or downsampling, tending to overlook locally discernible features. In this paper, we propose a novel scheme, named selective attention (SeA), as an alternative to regular self-attention with higher efficiency and conciseness. Specifically, we progressively learn fine-grained features in images by focusing the network on regions with high attention scores via a multi-step training and inference strategy. Also, SeA can be viewed as a plug-and-play module for various hierarchical architectures (e.g., ResNet, Swin) and significantly improves the performance of existing backbones. Extensive experimental results on five fine-grained benchmarks substantiate the effectiveness of our approach, e.g., new SOTA on CUB-200 and Nabird datasets with an accuracy of 93.0% and 93.9.

Video



Citation

@inproceedings{Chen_2022_BMVC,
author    = {yajie Chen and Huan Wang and PAN PEIWEN},
title     = {SeA: Selective Attention for Fine-grained Visual Categorization},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {https://bmvc2022.mpi-inf.mpg.de/0191.pdf}
}


Copyright © 2022 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection