Benchmarks
Every number below is real and reproducible. No marketing math.
57.3%cost reduction vs direct API
30.0%Booster hit rate (zero LLM cost)
35.0%Cache hit rate (2nd pass)
0.02msaverage pipeline overhead
Setup
- Dataset: 400 prompts, 200 unique × 2 passes. Mix of boostable (math/JSON/dates), cacheable (repeats), and regular LLM workloads.
- Gateway: mocked at realistic provider latency (~1200ms p50).
- Hardware: single Node.js 20 process, M-series laptop.
- Date: 2026-04-09.
Cost comparison on the 400-prompt run
| Scenario | Cost |
|---|---|
| Direct API (no ClawPipe) | $0.110 |
| With ClawPipe pipeline | $0.047 |
| Savings | $0.063 (57.3%) |
Stage latency breakdown
| Stage | Avg time |
|---|---|
| Booster | 0.0125ms |
| Packer | 0.0053ms |
| Cache | 0.0001ms |
| Router | 0.0040ms |
| Gateway (mocked provider) | 1206.6705ms |
Pipeline overhead is 0.0218ms — five orders of magnitude below the provider call itself.
Reproduce it yourself
git clone https://github.com/finsavvyai/clawpipe cd clawpipe/benchmarks npm install npx tsx real-benchmark.ts open results/summary.md
The dataset (benchmarks/prompt-dataset.json), runner (real-benchmark.ts), and raw results (results/benchmark-results.json) are all in the repo. Fork it, swap in your own prompts, run it on your traffic.
What the numbers mean for your app
- If 30% of your agent traffic is boostable (math, JSON formatting, date math, base64, stats), that portion pays $0 in LLM tokens.
- If your app sees repeat prompts (common in chatbots, form-filling agents, support tools), semantic cache adds another 20–35% hit rate on top.
- The router keeps picking cheaper models for eligible tasks and re-weighting as outcomes arrive.