2026-04-19 · 4 min read
How to use ClawPipe — cut your LLM bill in 60 seconds
One install. Three lines of code. Every prompt now flows through 246 deterministic rules, a self-learning router, a semantic cache, and a budget guard — before it ever hits OpenAI, Anthropic, or anyone else. No code rewrite. No lock-in.
The problem in one sentence
Most apps send every prompt to the most capable model at full price — even when the answer is "convert 17% to a decimal" or the same question someone else asked yesterday.
The fix in four steps
Install
npm install clawpipe-ai
Sign up (free, no card)
Grab an API key at app.clawpipe.ai/signup. The free tier is 1,000 calls/day — that's 30K/month, 3× larger than Portkey's free tier.
Replace your provider call
import { ClawPipe } from 'clawpipe-ai';
const pipe = new ClawPipe({
apiKey: process.env.CLAWPIPE_API_KEY,
projectId: 'my-app',
});
const { text, meta } = await pipe.prompt('Explain this code', {
system: 'You are a code reviewer',
maxTokens: 500,
});
console.log(text);
console.log(meta.estimatedCostUsd); // 0 if Booster or Cache resolved it
That's it. Same input. Same output. Lower bill.
Watch the savings tick up
app.clawpipe.ai shows boost rate, cache hit rate, and a cost-trend chart. Send 100 prompts and the green "saved" bars start showing within minutes.
What's actually happening
Every call flows through six pipeline stages, in order. Any stage can short-circuit the rest:
- Booster — 246 deterministic rules. "What is 17% of 240?" →
40.8in 0.0125 ms at $0. 30% of typical agent traffic resolves here. - Packer — strips redundancy from your prompt + system. ~4% fewer tokens, free.
- Semantic Cache — "explain recursion" and "what is recursion?" hit the same cached answer. Hash caches miss both.
- Router — picks the cheapest model that meets the quality bar for the task. Learns from outcomes.
- Gateway — 24+ providers behind one interface (OpenAI, Anthropic, Gemini, Groq, DeepSeek, Mistral, Together, Fireworks, Perplexity, xAI, Cohere, AI21, Cerebras, Replicate, Hugging Face, Writer, Databricks, Azure OpenAI, Bedrock, Vertex, OpenRouter, Ollama).
- Learner — every response updates the router. The longer it runs, the smarter it gets.
Common patterns
Hard budget cap
new ClawPipe({
apiKey, projectId,
budgetCapUsd: 50, // hard ceiling per day
budgetWarnUsd: 40, // soft warning at 80%
});
Quality mode (swarm vote)
import { ClawPipe, withPreset, QualityMode } from 'clawpipe-ai';
const pipe = new ClawPipe(withPreset({ apiKey, projectId }, QualityMode));
Cheap mode (cache + cheap-model allowlist)
import { CheapMode, withPreset } from 'clawpipe-ai';
const pipe = new ClawPipe(withPreset({ apiKey, projectId }, CheapMode));
Guard rails (15 plugins out of the box)
new ClawPipe({
apiKey, projectId,
guardRules: [
{ guard: 'pii_redact' },
{ guard: 'contains', config: { words: ['ssn', 'credit card'] }, blockOnFail: true },
{ guard: 'model_whitelist', config: { models: ['claude-haiku-4-5'] } },
],
});
Cross-provider tool calling
import { toolsForProvider, parseToolCalls } from 'clawpipe-ai';
const tools = [{
name: 'get_weather',
description: 'Get current weather for a city',
parameters: { type: 'object', properties: { city: { type: 'string' } }, required: ['city'] },
}];
// works on OpenAI, Anthropic, Gemini, Groq, DeepSeek, Mistral
const wireTools = toolsForProvider(tools, 'anthropic');
What you save (real numbers)
| Stage | Typical reduction |
|---|---|
| Booster (skip LLM entirely) | 20-35% of traffic at −100% cost |
| Semantic cache | +15-25% hit rate on top of exact-match |
| Router (cheaper model that still works) | 5-15% on eligible calls |
| Packer (token compression) | ~4% on every non-boosted call |
| Combined on the public 400-prompt benchmark | 57.3% |
Reproduce the benchmark: clawpipe.ai/benchmarks
Where it runs
- SDK in your process (Node, Bun, Deno, edge runtimes). Adds ~0.02 ms per call.
- Gateway on Cloudflare Workers — same runtime as Cloudflare AI Gateway, but with the full pipeline above.
- OpenAI-compatible endpoint. Drop-in replacement for
baseURLif you don't want the SDK.
Try it now
Start free → Open source on GitHub →
If you're already on Portkey, LiteLLM, or going direct to OpenAI, see the side-by-side at /vs/portkey, /vs/litellm, or /vs/direct-api.