HiVG
Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling
The best open-source SVG generation model (3B size) — surpassing GPT-5 and Gemini 2.5 on image-to-SVG.
Small Model,
Frontier Results
High-Fidelity
Image-to-SVG
Efficient SVG
Token Compression

3B parameters. SVG quality that rivals the closed-source giants.

7/7
Proprietary Models Beaten
+17%
Usability over GPT-5
< 1s
Instant Results
HiVG Experimental Results

Convert any image into a clean, editable SVG — structure, layout, and detail faithfully preserved.

2.76× token sequence compression. Sub-second generation vs. minutes for closed-source models.

2.76×
Token Sequence Compression
2.7×
Fewer Training Tokens
Interactive Tokenization Demo

HiVG transforms raw SVG code through two levels of tokenization. Click each stage to see the transformation.

Generic LLM
Tokenizer
~40 tokens
Atomic
Tokens
26 tokens
Segment
Tokens
8 tokens
SVG Preview · rocket
Tokenizing highlighted path
Token count comparison
BPE
~40
Atomic
26
Segment
8
5× fewer tokens: BPE → Segment
Generic LLM Tokenizer (BPE)
path d="M 256 48 c -40 0 -80 60 -100 140 s 10 60 …"
~40
Legend struct attr cmd coord·abs coord·rel BPE token fragmented ⚠ seg·* N→1
Tech Innovations
Raw SVG
String
Atomic
Tokenizer
Segment
Learning
3B HiVG
Detokenizer
SVG
Output

HiVG's pipeline begins with a Hierarchical SVG Tokenizer that decomposes raw SVG into atomic tokens, then learns Structure Segments to compress sequences. A 3B-parameter language model generates SVG token sequences, which are detokenized back into clean SVG output.

HiVG Method
TOKENIZER

Hierarchical SVG Tokenizer

A two-level tokenization scheme that preserves the full geometric semantics of SVG commands. Level 1 (Atomic) decomposes raw SVG into typed tokens — structure, commands, coordinates, and attributes. Level 2 (Segment) merges each drawing command with its parameters into a single compact token.

2.76×
Sequence Compression
Fewer than BPE
COMPRESSION

Structure Segment Learning

Learns segment tokens directly from the SVG corpus. Each segment merges a drawing command (e.g., Bézier curve, arc, line) with its coordinate parameters into one token, capturing renderable geometric primitives while discarding invalid combinations.

2.7×
Fewer Training Tokens
<1s
Generation Time
Segment Learning
Citation
@article{xing2026hivg,
title={Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling},
author={Xing, Ximing and Xue, Ziteng and Li, Zhenxi and Liang, Weicong and Wang, Linqing and Yang, Zhantao and Hang, Tiankai and Yin, Zijin and Lu, Qinglin and Wang, Chunyu and Yu, Qian},
journal={arXiv preprint},
year={2026}
}
HiVG: Hierarchical SVG Tokenization
© 2026 Tencent HunYuan