2026-04-19 · 4 min read

How to use ClawPipe — cut your LLM bill in 60 seconds

One install. Three lines of code. Every prompt now flows through 246 deterministic rules, a self-learning router, a semantic cache, and a budget guard — before it ever hits OpenAI, Anthropic, or anyone else. No code rewrite. No lock-in.

The problem in one sentence

Most apps send every prompt to the most capable model at full price — even when the answer is "convert 17% to a decimal" or the same question someone else asked yesterday.

The fix in four steps

Install

npm install clawpipe-ai

Sign up (free, no card)

Grab an API key at app.clawpipe.ai/signup. The free tier is 1,000 calls/day — that's 30K/month, 3× larger than Portkey's free tier.

Replace your provider call

import { ClawPipe } from 'clawpipe-ai';

const pipe = new ClawPipe({
  apiKey: process.env.CLAWPIPE_API_KEY,
  projectId: 'my-app',
});

const { text, meta } = await pipe.prompt('Explain this code', {
  system: 'You are a code reviewer',
  maxTokens: 500,
});

console.log(text);
console.log(meta.estimatedCostUsd);  // 0 if Booster or Cache resolved it

That's it. Same input. Same output. Lower bill.

Watch the savings tick up

app.clawpipe.ai shows boost rate, cache hit rate, and a cost-trend chart. Send 100 prompts and the green "saved" bars start showing within minutes.

What's actually happening

Every call flows through six pipeline stages, in order. Any stage can short-circuit the rest:

Booster — 246 deterministic rules. "What is 17% of 240?" → 40.8 in 0.0125 ms at $0. 30% of typical agent traffic resolves here.
Packer — strips redundancy from your prompt + system. ~4% fewer tokens, free.
Semantic Cache — "explain recursion" and "what is recursion?" hit the same cached answer. Hash caches miss both.
Router — picks the cheapest model that meets the quality bar for the task. Learns from outcomes.
Gateway — 21 providers behind one interface (OpenAI, Anthropic, Gemini, Groq, DeepSeek, Mistral, Together, Fireworks, Perplexity, xAI, Cohere, AI21, Cerebras, Replicate, Hugging Face, Writer, Databricks, Azure OpenAI, Bedrock, Vertex, OpenRouter) plus any OpenAI-compatible endpoint (Ollama, LM Studio, llamafile, vLLM, TGI).
Learner — every response updates the router. The longer it runs, the smarter it gets.

Common patterns

Hard budget cap

new ClawPipe({
  apiKey, projectId,
  budgetCapUsd: 50,            // hard ceiling per day
  budgetWarnUsd: 40,           // soft warning at 80%
});

Quality mode (swarm vote)

import { ClawPipe, withPreset, QualityMode } from 'clawpipe-ai';
const pipe = new ClawPipe(withPreset({ apiKey, projectId }, QualityMode));

Cheap mode (cache + cheap-model allowlist)

import { CheapMode, withPreset } from 'clawpipe-ai';
const pipe = new ClawPipe(withPreset({ apiKey, projectId }, CheapMode));

Guard rails (15 plugins out of the box)

new ClawPipe({
  apiKey, projectId,
  guardRules: [
    { guard: 'pii_redact' },
    { guard: 'contains', config: { words: ['ssn', 'credit card'] }, blockOnFail: true },
    { guard: 'model_whitelist', config: { models: ['claude-haiku-4-5'] } },
  ],
});

Cross-provider tool calling

import { toolsForProvider, parseToolCalls } from 'clawpipe-ai';

const tools = [{
  name: 'get_weather',
  description: 'Get current weather for a city',
  parameters: { type: 'object', properties: { city: { type: 'string' } }, required: ['city'] },
}];

// works on OpenAI, Anthropic, Gemini, Groq, DeepSeek, Mistral
const wireTools = toolsForProvider(tools, 'anthropic');

What you save (real numbers)

Stage	Mechanism
Booster (skip LLM entirely)	Deterministic resolution at −100% cost for matching prompts
Semantic cache	Embedding-based dedup on top of exact-match hash
Router (cheaper model that still works)	Cost-quality model selection on eligible calls
Packer (token compression)	Compression on every non-boosted call
Combined per-bucket numbers	Pending public measured benchmark (in progress)

Methodology + comment window: github.com/finsavvyai/clawpipe-booster-benchmark

Where it runs

SDK in your process (Node, Bun, Deno, edge runtimes). Adds ~0.02 ms per call.
Gateway on Cloudflare Workers — same runtime as Cloudflare AI Gateway, but with the full pipeline above.
OpenAI-compatible endpoint. Drop-in replacement for baseURL if you don't want the SDK.

Try it now

Start free → Open source on GitHub →

If you're already on Portkey, LiteLLM, or going direct to OpenAI, see the side-by-side at /vs/portkey, /vs/litellm, or /vs/direct-api.