Parallel and Robust Text Rectifier for Scene Text Recognition


Bingcong Li (Ping An property&casualty insurance company of China.LTD.), xin tang (Ping An property&casualty insurance company of China.LTD.), Jun Wang (Ping An Technology (Shenzhen) Co. Ltd.), Liang Diao (Ping An property&casualty insurance company of China.LTD.), RUI FANG (Ping An Property & Casualty Insurance Company of China), Guotong Xie (Ping An Technology (Shenzhen) Co. Ltd.), Weifu Chen (Guangzhou Maritime University)*
The 33rd British Machine Vision Conference

Abstract

Scene text recognition (STR) is to recognize text appearing in images. Current state-of-the-art STR methods usually adopt a multi-stage framework which uses a rectifier to iteratively rectify errors from previous stage. However, the rectifiers of those models are not proficient in addressing the misalignment problem.To alleviate this problem, we proposed a novel network named Parallel and Robust Text Rectifier (PRTR), which consists of a bi-directional position attention initial decoder and a sequence of stacked Robust Visual Semantic Rectifiers (RVSRs). In essence, PRTR is creatively designed as a coarse-to-fine architecture that exploits a sequence of rectifiers for repeatedly refining the prediction in a stage-wise manner. RVSR is a core component in the proposed model which comprises two key modules, Dual-Path Semantic Alignment (DPSA) module and Visual-Linguistic Alignment (VLA). DPSA can rectify the linguistic misalignment issues via the global semantic features that are derived from the recognized characters as a whole, while VLA re-aligns the linguistic features with visual features by an attention model to avoid the overfitting of linguistic features. All parts of PRTR are non-autoregressive (parallel), and its RVSR re-aligns its output according to the linguistic features and the visual features, so it is robust to the mis-aligned error. Extensive experiments on mainstream benchmarks demonstrate that the proposed model can alleviate the misalignment problem to a large extent and outperformed state-of-the-art models.

Video



Citation

@inproceedings{Li_2022_BMVC,
author    = {Bingcong Li and xin tang and Jun Wang and Liang Diao and RUI FANG and Guotong Xie and Weifu Chen},
title     = {Parallel and Robust Text Rectifier for Scene Text Recognition},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {https://bmvc2022.mpi-inf.mpg.de/0770.pdf}
}


Copyright © 2022 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection