Adversarial Vision Transformer for Medical Image Semantic Segmentation with Limited Annotations

Ziyang Wang (University of Oxford),* Will Zhao (Bucknell University), Zixuan Ni (CU Boulder), Yuchen Zheng (University of North Carolina at Chapel Hill)
The 33rd British Machine Vision Conference


Medical image analysis has benefited from deep learning techniques not only because of network architecture engineering, but also a large number of high-quality annotations which is time- and labour-consuming. Motivated by the recent success of Vision Transformer(ViT), we propose to explore the power of ViT for medical image semantic segmentation with an advanced consistency-aware adversarial Semi-Supervised Learning(SSL) fashion. Aiming to train Segmentation ViT model(sViT) with labelled and unlabelled data simultaneously, an adversarial SSL framework that consists of a sViT and an evaluation model(EM) is proposed in this paper. During the adversarial training process, the EM is trained to classify the quality of inference of sViT is from labelled/unlabelled sample, and the sViT is initialized and trained against EM (i.e. all inference by sViT is high-quality enough to be classified as being from labelled data). To further boost the performance of sViT under the consistency-aware concern, mixup-based interpolation consistency regularization is introduced and utilized for sViT. The whole adversarial training is designed separately for sViT and EM in an iterative manner, and the consistency training is solely for sViT. Experimental results demonstrate the proposed consistency-aware adversarial sViT competitive performance against other SSL methods on a public benchmark data set with a variety of metrics. The code is publicly available on GitHub.



author    = {Ziyang Wang and Will Zhao and Zixuan Ni and Yuchen Zheng},
title     = {Adversarial Vision Transformer for Medical Image Semantic Segmentation with Limited Annotations},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {}

Copyright © 2022 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection