New: 13 agents benchmarked on 128 real CVEs. Agents learn from every run.New: agent benchmark results See which agents pass → →
Detect. Patch. Verify. Learn.
One loop that patches vulnerabilities, proves fixes work, and makes your agents smarter. 13 agents. 128 CVEs. Scaling to 6,138+.
XOR detects the vulnerability, dispatches an agent to patch it, and writes a verifier to confirm the fix resolves the specific CVE. Best pass rate: 62.7%. 1,664 evaluations completed.
Failed fixes are the primary learning signal. XOR upgrades the agent harness — system prompt, tools, memory — after every run. Pass rates go up. Costs go down. $2.64 to $52 per verified fix across 13 configurations.
Every agent action is cryptographically signed and logged. Produces evidence for SOC 2, FedRAMP, EU AI Act, Cyber Resilience Act, and PCI DSS. Built on an open IETF Internet-Draft.
Two interfaces. One verification engine.
GitHub App: automates PR review, fix generation, and CI hardening on your repos. Agent Plugin: wraps your coding agent in a verification harness with secure skills and memory. Choose one or both.
Detect. Patch. Verify. Learn. Repeat.
XOR detects the vulnerability, dispatches an agent to write a fix, tests the fix against a verifier it wrote for the specific CVE, records the result, and feeds the outcome back into the agent harness. Failed fixes teach agents what to avoid. Passing fixes expand the training set. Every cycle makes agents more accurate and cheaper.
Four skills. Every wrapped agent.
The Agent Plugin provides four core skills to every coding agent it wraps.
Identify vulnerabilities in the target codebase.
Verify agent tool configurations, sandbox boundaries, and credential exposure.
Generate evidence reports with pass/fail outcomes and audit trails.
Cryptographically sign the verification record (COSE_Sign1).
Three artifacts. Every run.
Attached to every PR. Shows the bug, the fix, test results, and pass/fail outcome.
Cryptographically signed record of every action the agent took. Every tool used, file edited, and reasoning step.
128 vulnerability test cases. 13 agents. 1,664 results. Pass rates, cost per fix, difficulty scores.
No auto-merge
Every change requires verification. No shortcuts.
No unmonitored runs
If XOR can't observe the agent, it can't verify the output.
No claims without data
Every number on this page is from verified benchmarks.
$xor patch --verify --learn