Segmentation Assisted U-shaped Multi-scale Transformer for Crowd Counting

Yifei Qian (University of St Andrews),* Liangfei Zhang (University of St Andrews), Xiaopeng Hong (Harbin Institute of Technology), Carl Donovan (University of St Andrews ), Ognjen Arandjelovic (University of St Andrews)
The 33rd British Machine Vision Conference


Vision crowd counting task has made remarkable process in recent years thanks to the development of CNNs. However, this field has run into bottleneck since CNNs, by their nature, are limited by locally attentive receptive fields and incapable to model long-term dependencies. To address this problem, we introduce a multi-scale transformer based crowd counting network, termed Crowd U-Transformer (CUT) which extracts and aggregates semantic and spatial features from multiple levels. In this design, we use crowd segmentation as an attention module to gain fine-grained features. Also, we propose a loss function to better focus on the counting performance in foreground area. Experimental results on four widely used benchmarks are exhibited and our method shows state-of-the-art performances.



author    = {Yifei Qian and Liangfei Zhang and Xiaopeng Hong and Carl Donovan and Ognjen Arandjelovic},
title     = {Segmentation Assisted U-shaped Multi-scale Transformer for Crowd Counting},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {}

Copyright © 2022 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection