Text Shaping in CVE-Agent-Bench — 19 vulnerabilities tested
19 vulnerability samples from a text shaping library, generating 285 evaluations across 15 agents.
Overview
This text shaping library is a Unicode text shaping engine used by Chrome, Firefox, Android, and LibreOffice to render complex scripts correctly. Font rendering requires precise memory management and bounds checking, making it a critical component in millions of software systems. The library handles font files with varying complexity, from simple Latin scripts to intricate Asian and Indic writing systems.
Benchmark coverage
19 vulnerability samples from this text shaping library are included in CVE-Agent-Bench, generating 285 individual evaluations across 15 agent configurations. These samples focus on buffer overflows, heap memory corruption, and out-of-bounds reads that occur during font parsing and glyph shaping operations.
Vulnerability classes
Text shaping samples cover specific vulnerability patterns common in font processing code:
- Heap buffer overflows in font table parsing, where malformed or truncated font files exceed allocated memory bounds
- Out-of-bounds reads in glyph shaping operations, triggered by invalid glyph indices or corrupted outline data
- Integer overflows in size calculations that lead to undersized buffer allocation
- Use-after-free bugs in font object reference counting during decompression or format conversion
- Null pointer dereferences when expected font table structures are missing or malformed
Why text shaping bugs are interesting for agent evaluation
Text shaping vulnerabilities test an agent's ability to understand complex memory safety issues in font parsing code. The project has intricate data structures for font tables, glyph outlines, and shaping algorithms. Bugs often require domain-specific knowledge about OpenType font specification and careful handling of variable-length binary data. Agents must balance defensive programming with performance constraints in a widely-used library.
The 19 samples in the benchmark represent the types of issues that lead to remote code execution when processing untrusted font files. A single malformed font embedded in a web page, PDF, or email attachment can trigger memory corruption.
Agent performance on text shaping
Per-project performance data is not yet published. The full benchmark results aggregate performance across all codebases. You can review how individual agents performed overall at the full results page, where you can sort by pass rate, cost, and other metrics. The methodology behind agent evaluation is documented in the benchmark methodology guide.
Related codebases
Other memory-safety intensive codebases in the benchmark include:
- Archive Library, archive format parsing with similar bounds-checking challenges
- Image Codec, image codec with comparable decoding complexity
- Git Library, binary format parsing with variable-length data handling
Explore more
- Full benchmark results
- Agent profiles
- Methodology
- Economics analysis, cost per verified patch
FAQ
How do agents perform on text shaping vulnerabilities?
The text shaping project has 19 samples in CVE-Agent-Bench, the largest per-project sample. Font parsing bugs test domain-specific knowledge that varies across agent models.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
Benchmark Methodology
How XOR benchmarks AI coding agents on real security vulnerabilities. Reproducible, deterministic, and transparent.
Benchmark Results
62.7% pass rate. $2.64 per fix. Real data from 1,920 evaluations.
See which agents produce fixes that work
128 CVEs. 15 agents. 1,920 evaluations. Agents learn from every run.