2026-04-19 · 4 min read

How to use ClawPipe — cut your LLM bill in 60 seconds

One install. Three lines of code. Every prompt now flows through 246 deterministic rules, a self-learning router, a semantic cache, and a budget guard — before it ever hits OpenAI, Anthropic, or anyone else. No code rewrite. No lock-in.

The problem in one sentence

Most apps send every prompt to the most capable model at full price — even when the answer is "convert 17% to a decimal" or the same question someone else asked yesterday.

The fix in four steps

1

Install

npm install clawpipe-ai
2

Sign up (free, no card)

Grab an API key at app.clawpipe.ai/signup. The free tier is 1,000 calls/day — that's 30K/month, 3× larger than Portkey's free tier.

3

Replace your provider call

import { ClawPipe } from 'clawpipe-ai';

const pipe = new ClawPipe({
  apiKey: process.env.CLAWPIPE_API_KEY,
  projectId: 'my-app',
});

const { text, meta } = await pipe.prompt('Explain this code', {
  system: 'You are a code reviewer',
  maxTokens: 500,
});

console.log(text);
console.log(meta.estimatedCostUsd);  // 0 if Booster or Cache resolved it

That's it. Same input. Same output. Lower bill.

4

Watch the savings tick up

app.clawpipe.ai shows boost rate, cache hit rate, and a cost-trend chart. Send 100 prompts and the green "saved" bars start showing within minutes.

What's actually happening

Every call flows through six pipeline stages, in order. Any stage can short-circuit the rest:

  1. Booster — 246 deterministic rules. "What is 17% of 240?" → 40.8 in 0.0125 ms at $0. 30% of typical agent traffic resolves here.
  2. Packer — strips redundancy from your prompt + system. ~4% fewer tokens, free.
  3. Semantic Cache — "explain recursion" and "what is recursion?" hit the same cached answer. Hash caches miss both.
  4. Router — picks the cheapest model that meets the quality bar for the task. Learns from outcomes.
  5. Gateway — 24+ providers behind one interface (OpenAI, Anthropic, Gemini, Groq, DeepSeek, Mistral, Together, Fireworks, Perplexity, xAI, Cohere, AI21, Cerebras, Replicate, Hugging Face, Writer, Databricks, Azure OpenAI, Bedrock, Vertex, OpenRouter, Ollama).
  6. Learner — every response updates the router. The longer it runs, the smarter it gets.

Common patterns

Hard budget cap

new ClawPipe({
  apiKey, projectId,
  budgetCapUsd: 50,            // hard ceiling per day
  budgetWarnUsd: 40,           // soft warning at 80%
});

Quality mode (swarm vote)

import { ClawPipe, withPreset, QualityMode } from 'clawpipe-ai';
const pipe = new ClawPipe(withPreset({ apiKey, projectId }, QualityMode));

Cheap mode (cache + cheap-model allowlist)

import { CheapMode, withPreset } from 'clawpipe-ai';
const pipe = new ClawPipe(withPreset({ apiKey, projectId }, CheapMode));

Guard rails (15 plugins out of the box)

new ClawPipe({
  apiKey, projectId,
  guardRules: [
    { guard: 'pii_redact' },
    { guard: 'contains', config: { words: ['ssn', 'credit card'] }, blockOnFail: true },
    { guard: 'model_whitelist', config: { models: ['claude-haiku-4-5'] } },
  ],
});

Cross-provider tool calling

import { toolsForProvider, parseToolCalls } from 'clawpipe-ai';

const tools = [{
  name: 'get_weather',
  description: 'Get current weather for a city',
  parameters: { type: 'object', properties: { city: { type: 'string' } }, required: ['city'] },
}];

// works on OpenAI, Anthropic, Gemini, Groq, DeepSeek, Mistral
const wireTools = toolsForProvider(tools, 'anthropic');

What you save (real numbers)

StageTypical reduction
Booster (skip LLM entirely)20-35% of traffic at −100% cost
Semantic cache+15-25% hit rate on top of exact-match
Router (cheaper model that still works)5-15% on eligible calls
Packer (token compression)~4% on every non-boosted call
Combined on the public 400-prompt benchmark57.3%

Reproduce the benchmark: clawpipe.ai/benchmarks

Where it runs

Try it now

Start free →   Open source on GitHub →

If you're already on Portkey, LiteLLM, or going direct to OpenAI, see the side-by-side at /vs/portkey, /vs/litellm, or /vs/direct-api.