Rethinking Graph Neural Networks for Unsupervised Video Object Segmentation

Daizong Liu (Peking University),* Wei Hu (Peking University)

The 33^rd British Machine Vision Conference

Abstract

This paper addresses the task of video object segmentation in an unsupervised manner. Prevailing solutions can be grouped into two categories: 1) two-stream approaches combine both local motion and appearance information, which heavily rely on the quality of optical flow and are not robust to occluded or static objects; 2) appearance matching approaches utilize Siamese networks to learn the relation between two frames (generally the first frame and the current frame), which lack robustness to the appearance variation in long videos. Although recent attentive graph neural networks tackle the above two limitations in an appearance matching manner by matching multiple frames at the same time, the performance is inferior to the counterparts thus far. In this paper, we argue that the performance of such attentive graph model is severely underestimated by current limited designs, including both the node design and the global graph matching. To this end, we develop a novel attentive graph-based model: \textbf{R}egion-wise \textbf{G}lobal-graph with \textbf{B}oundary-aware \textbf{L}ocal-learning (\textbf{RGBL}). Regarding the node design of the global graph network, instead of taking the whole image as a frame-wise node, RGBL predicts the foreground region in each frame and takes the corresponding regional features as the nodal input to filter out the background noise, which incidentally mitigates the noisy visual similarity among frames. Regarding the global graph matching, RGBL learns more local saliency in individual frames, which incorporates the boundary information to emphasize on the features along the foreground boundary for mask refinement in each frame. Extensive experiments on three challenging benchmarks show that our RGBL surpasses the state-of-the-arts with a large margin.

Video

Citation

@inproceedings{Liu_2022_BMVC,
author    = {Daizong Liu and Wei Hu},
title     = {Rethinking Graph Neural Networks for Unsupervised Video Object Segmentation},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {https://bmvc2022.mpi-inf.mpg.de/0076.pdf}
}

Copyright © 2022 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection