HiVG: Hierarchical SVG Tokenization

The best open-source SVG generation model (3B size) — surpassing GPT-5 and Gemini 2.5 on image-to-SVG.

Small Model,
Frontier Results

High-Fidelity
Image-to-SVG

Efficient SVG
Token Compression

3B parameters. SVG quality that rivals the closed-source giants.

7/7

Proprietary Models Beaten

+17%

Usability over GPT-5

< 1s

Instant Results

Flat Icons

"A minimalist clock icon with hour markers and octagonal frame"

"A flat analog clock with blue circular face and thin border"

"A flat laptop icon with screen, keyboard and stand"

"A flat industrial factory icon with chimneys and windows"

"A residential house with double garage doors and chimney"

"A trophy icon with star emblem and layered pedestal"

"A bar chart dashboard with trend line overlay"

"A target crosshair icon with circular rings"

"A bright orange directional compass arrows icon"

"A user profile document icon with avatar circle"

Colorful Illustrations

"A layered hamburger emoji with sesame-seed bun and patty"

"A golden gear icon with center bullseye on orange circle"

"A laptop illustration on vibrant pink circle background"

"A browser window icon on pink circular background"

"A tablet with colorful 3×3 app icon grid layout"

"A colorful gift box with dot pattern and ribbon"

"A cartoon figurine with round ears on display pedestal"

"A decorative picture frame with star accent"

"A desktop monitor displaying completed task checkmarks"

"A stylized clock with blue face and colored sector highlights"

Poster & UI Design

"A colorful spreadsheet layout with yellow-to-pink gradient"

"A UI dashboard layout with warm gradient color scheme"

"A gradient message card with text placeholders"

"A heraldic shield badge with crown emblem"

"A dual-window notebook with chat bubble overlays"

"A ring-bound desk calendar with date highlight"

"A digital planner with ring binding and blue accent"

"A smartphone checklist UI in purple theme"

"A warm-toned video player interface with play button"

"A rounded app icon with blue tint and user avatar"

Sequence-length compression, token-efficient scaling, and human evaluation

Convert any image into a clean, editable SVG — structure, layout, and detail faithfully preserved.

Typography & Fonts

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Logos & Brands

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Icons & Illustrations

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

Input

HiVG

2.76× token sequence compression. Sub-second generation vs. minutes for closed-source models.

2.76×

Token Sequence Compression

2.7×

Fewer Training Tokens

Interactive Tokenization Demo

HiVG transforms raw SVG code through two levels of tokenization. Click each stage to see the transformation.

①

Generic LLM
Tokenizer

~40 tokens

②

Atomic
Tokens

26 tokens

③

Segment
Tokens

8 tokens

SVG Preview · rocket

Tokenizing highlighted path

Token count comparison

BPE

~40

Atomic

Segment

5× fewer tokens: BPE → Segment

Generic LLM Tokenizer (BPE)

path d="M 256 48 c -40 0 -80 60 -100 140 s 10 60 …"

~40

Legend struct attr cmd coord·abs coord·rel BPE token fragmented ⚠ seg·* N→1

Tech Innovations

Raw SVG
String

→

Atomic
Tokenizer

→

Segment
Learning

→

3B HiVG

→

Detokenizer

→

SVG
Output

HiVG's pipeline begins with a Hierarchical SVG Tokenizer that decomposes raw SVG into atomic tokens, then learns Structure Segments to compress sequences. A 3B-parameter language model generates SVG token sequences, which are detokenized back into clean SVG output.

TOKENIZER

Hierarchical SVG Tokenizer

A two-level tokenization scheme that preserves the full geometric semantics of SVG commands. Level 1 (Atomic) decomposes raw SVG into typed tokens — structure, commands, coordinates, and attributes. Level 2 (Segment) merges each drawing command with its parameters into a single compact token.

2.76×

Sequence Compression

5×

Fewer than BPE

COMPRESSION

Structure Segment Learning

Learns segment tokens directly from the SVG corpus. Each segment merges a drawing command (e.g., Bézier curve, arc, line) with its coordinate parameters into one token, capturing renderable geometric primitives while discarding invalid combinations.

2.7×

Fewer Training Tokens

<1s

Generation Time

Citation

@article{xing2026hivg,
title={Hierarchical SVG Tokenization: Learning Compact Visual Programs for Scalable Vector Graphics Modeling},
author={Xing, Ximing and Xue, Ziteng and Li, Zhenxi and Liang, Weicong and Wang, Linqing and Yang, Zhantao and Hang, Tiankai and Yin, Zijin and Lu, Qinglin and Wang, Chunyu and Yu, Qian},
journal={arXiv preprint},
year={2026}
}