AssocFormer: Association Transformer for Multi-label Classification

Xin Xing (University of Kentucky),* Chong Peng (Qingdao University), Yu Zhang (University of Kentucky), Ai-Ling Lin (MU-Radiology), Nathan Jacobs (Washington University in St. Louis)
The 33rd British Machine Vision Conference


The goal of multi-label image classification is to predict a set of labels for a single image. Recent work has shown that explicitly modeling the co-occurrence relationship between classes is critical for achieving good performance on this task. State-of-the-art approaches model this using graph convolutional networks, which are complex and computationally expensive. We propose a novel, efficient association module as an alternative. This is coupled with a transformer-based feature-extraction backbone. The proposed model was evaluated using two standard datasets: MS-COCO and PASCAL VOC. The results show that the proposed model outperforms several strong baseline models.



author    = {Xin Xing and Chong Peng and Yu Zhang and Ai-Ling Lin and Nathan Jacobs},
title     = {AssocFormer: Association Transformer for Multi-label Classification},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {}

Copyright © 2022 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection