CLIPFont: Text Guided Vector WordArt Generation

Yiren Song (Shanghai Jiaotong University),* Yuxuan Zhang (Shanghai jiaotong University)
The 33rd British Machine Vision Conference


Font design is a repetitive job that requires specialized skills. Unlike the existing few-shot font generation methods, this paper proposes a zero-shot font generation method for any language based on the CLIP model. The style of the font is controlled by the text description, and the skeleton of the font remain the same as the input reference font. CLIPFont optimizes the parameters of vector fonts by gradient descent and achieves artistic font generation by minimizing the directional distance between text description and font in the CLIP embedding space. CLIP recognition loss is proposed to keep the category of each character unchanged. The gradients computed on the rasterized images are returned to the vector parameter space by means of a differentiable vector renderer. Experimental results and Turing tests demonstrate our method's state-of-the-art performance.



author    = {Yiren Song and Yuxuan Zhang},
title     = {CLIPFont: Text Guided Vector WordArt Generation},
booktitle = {33rd British Machine Vision Conference 2022, {BMVC} 2022, London, UK, November 21-24, 2022},
publisher = {{BMVA} Press},
year      = {2022},
url       = {}

Copyright © 2022 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection