New: 13 agents benchmarked on 128 real CVEs. Agents learn from every run.New: agent benchmark results See which agents pass → →
Fix bugs faster. Ship verified patches.
Agents fix bugs in minutes. XOR verifies every patch against the bug it claims to fix. 13 agents benchmarked on 128 real bugs. Pick the right one for your repos and your budget.
128 vulnerability test casesCurrent verified1,664 Verified evaluationsCurrent verified6,138+ target vulnerabilitiesTarget250+ projectsTarget
Your bug backlog grows. Manual fixes cannot keep up.
OutcomeAgents fix bugs in minutes. XOR verifies the fix against the actual bug. Cheapest verified fix: $2.64. Your reviewers only see patches that passed.
Mechanism13 agent-model configurations tested on 128 real bugs across 40 open-source projects. You get pass rates, failure modes, and cost per fix for each one.
Proof1,664 test runs. Cheapest working fix: $2.64. Total spend across all runs: $11,020.
Verified bug fixes in minutes, not days.
How verification worksAgents generate fixes in minutes. XOR verifies each fix against the actual bug. Best pass rate: 62.7%. Your reviewers only see code that passed verification. No more debugging broken agent patches.
Fix costs from $2.64 per bug.
Benchmark resultsCompare 13 agent configurations on cost per verified fix, pass rate, and failure modes. Pick the agent that fits your repos and your budget. Agents learn from every run so costs go down over time.
Platform infrastructure, not a point tool.
Platform investment guideXOR sits alongside your CI/CD, observability, and cloud security platforms. Justified by bug fix velocity, reduced review cycles, and vendor lock-in prevention. One plugin install, one GitHub App click.
"We already trust our agent vendor."
Pass rates range from 24.5% to 62.7% depending on the agent and project. Vendor demos do not show you what happens on your repos. The data does.
"Agent patches waste reviewer time."
XOR verifies every patch before it reaches review. 370 broken patches caught and rejected automatically. Your reviewers only see fixes that passed.
"How does this come out of our platform budget?"
Agent verification is platform-level infrastructure. It lives on your platform budget alongside CI/CD and observability. Cost justification: faster bug fixes, reduced review overhead, and flexibility to swap agent vendors without lock-in.
FAQ
How much faster can we fix bugs?
Agents generate fixes in minutes. XOR verifies them automatically. For the 128 bugs in our benchmark, the entire cycle — from vulnerability to verified fix — runs without manual intervention. Your team reviews verified patches, not raw agent output.
What does this cost?
The cheapest working fix in our benchmark was $2.64. Total cost across 1,664 test runs was $11,020. Compare that to the cost of manual bug fixes and code review.
How does this fit into our CI?
The GitHub App runs automatically on agent-generated PRs. It attaches pass/fail results directly to the PR. Fits into your existing review workflow — no separate dashboard to check.
Do agents improve over time?
Yes. Every pass and every failure feeds back into the agent harness. XOR upgrades the system prompt and injects memory from previous runs. 1,664 results already in the learning dataset.
Fix bugs faster. Verify every patch.
128 real bugs. 13 agents. 1,664 test runs. See which agent works for your repos.
$xor patch --verify --learn