Now in public beta

Stop overpaying for LLM API calls

ProxyLLM is a drop-in proxy that sits between your app and AI providers. Semantic caching, smart model routing, and a live cost dashboard — cutting up to 20–40% off cacheable workloads.

No credit card required 2-line integration
app.ts
// Before — direct to OpenAI
const openai = new OpenAI()

// After — through ProxyLLM
const openai = new OpenAI({
  baseURL: "https://api.proxyllm.dev/v1",
  apiKey: "pl_your_api_key",
})

Up and running in 2 minutes

No SDK to install. No code to refactor. Just change two lines.

01

Point your SDK to ProxyLLM

Swap your baseURL and API key. The official OpenAI SDK and Anthropic SDK (and any compatible client) work as-is.

02

We cache, route & track

ProxyLLM caches semantically similar queries, routes simple requests to cheaper models, and logs every call.

03

See savings in your dashboard

Watch your costs drop in real time. Tag requests by feature to see exactly where your money goes.

Everything you need to control LLM costs

One proxy between your app and AI providers. Full visibility, automatic savings, zero code changes beyond the initial setup.

Semantic Caching

Save 20-40% on repeated queries. We match semantically similar prompts — not just exact strings — so "What's the weather?" and "Tell me the weather" share one cached response.

Smart Model Routing

Auto-route simple queries to GPT-4o Mini. Complex prompts go to your workspace's default model (must be on your plan's whitelist, or enable BYOK). Scale unlocks custom routing rules.

Cost Tracking by Feature

Add an x-proxyllm-tag header to group costs by feature, team, or customer. See exactly which part of your app is burning through tokens.

OpenAI-Compatible API

Drop-in replacement for the OpenAI API. Works with the official SDKs for Python, Node, Go, and anything else that speaks the chat completions format.

Usage Dashboard

Real-time monitoring of requests, tokens, costs, cache hit rates, and model distribution. Filterable by tag, model, and time range.

Rate Limiting & Plans

Built-in usage tiers with per-minute and per-day rate limits. Upgrade plans in one click. No surprise bills.

Two lines to integrate

Works with the OpenAI SDK, the Anthropic SDK, LangChain, LlamaIndex, or any compatible HTTP client. Here's the Node.js example:

Before
app.ts
import OpenAI from "openai"

const client = new OpenAI({
  apiKey: "sk-...your-key",
})

const res = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ ]
    role: "user",
    content: "Hello!"
  }],
})
After — with ProxyLLM
app.ts
import OpenAI from "openai"

const client = new OpenAI({
  baseURL: "https://api.proxyllm.dev/v1",
  apiKey: "pl_your_api_key",
})

const res = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ ]
    role: "user",
    content: "Hello!"
  }],
})

Using Claude? Point the Anthropic SDK at https://api.proxyllm.dev and call /v1/messagesthe same way — one workspace key, both SDKs, shared cache. See the Anthropic SDK docs.

Simple, transparent pricing

Start free. Upgrade when you need more. No hidden fees and no per-token markup — managed plans include a monthly usage allowance (requests + a fair-use budget); BYOK runs uncapped on your own provider key. See plan limits.

Free

For side projects and experimentation.

$0forever
10,000 requests/mo
  • gpt-4o-mini model
  • Semantic caching
  • 1 API key
  • 7-day log retention
Start Free

Scale

For teams running AI at scale.

$99/month
500,000 requests/mo
  • Everything in Pro
  • 10× monthly quota at $0.0002/req
  • Custom routing rules
  • Webhook alerts
  • Unlimited API keys
  • 90-day log retention
  • Priority email support
Get Started

Will Pro pay for itself?

Pro's value is the cache. Bringing your own provider key? Estimate how much it saves on your provider bill versus the $29/mo fee.

Estimated cache savings on your provider bill
$63/mo

At these numbers Pro pays for itself — about $34/mo net after the $29 fee, before counting the dashboard, routing, and cost analytics.

Estimate using provider list prices (verified June 2026); your negotiated rates and prompt mix will differ. ProxyLLM measures your real cache-hit rate per request, so you can check this against actual usage. Managed gpt-4o-mini is different: a flat $29/mo for 50,000 requests with cache + dashboard, with no per-token cost to you — the calculator above is for BYOK on your own models.

How ProxyLLM compares

 ProxyLLMHeliconePortkeyVercel AI Gateway
Entry price$29/mo Pro$79/mo Pro$49/moFree + 0% markup
Semantic cacheYes (cross-format)BetaPluginNo
BYOK (zero markup)Pro/ScaleYesYes (BYOK-only)Yes
Anthropic /v1/messagesNativeNativeNativeNative
Single bill (managed)YesToken + HeliconeNoToken + Vercel
Active developmentYesMaintenance only(acq. by Mintlify, Mar 2026)YesYes

ProxyLLM's pitch: managed cross-format semantic cache, single bill, lowest entry price — with BYOK on Pro/Scale for zero-markup heavy users.