Now in public beta

Stop overpaying for LLM API calls

ProxyLLM is a drop-in proxy that sits between your app and AI providers. Semantic caching, smart model routing, and a live cost dashboard — cutting up to 20–40% off cacheable workloads.

Get Started Free View Docs

No credit card required 2-line integration

app.ts

// Before — direct to OpenAI
const openai = new OpenAI()

// After — through ProxyLLM
const openai = new OpenAI({
  baseURL: "https://api.proxyllm.dev/v1",
  apiKey: "pl_your_api_key",
})

Up and running in 2 minutes

No SDK to install. No code to refactor. Just change two lines.

Point your SDK to ProxyLLM

Swap your baseURL and API key. The official OpenAI SDK and Anthropic SDK (and any compatible client) work as-is.

We cache, route & track

ProxyLLM caches semantically similar queries, routes simple requests to cheaper models, and logs every call.

See savings in your dashboard

Watch your costs drop in real time. Tag requests by feature to see exactly where your money goes.

Everything you need to control LLM costs

One proxy between your app and AI providers. Full visibility, automatic savings, zero code changes beyond the initial setup.

Semantic Caching

Save 20-40% on repeated queries. We match semantically similar prompts — not just exact strings — so "What's the weather?" and "Tell me the weather" share one cached response.

Smart Model Routing

Auto-route simple queries to GPT-4o Mini. Complex prompts go to your workspace's default model (must be on your plan's whitelist, or enable BYOK). Scale unlocks custom routing rules.

Cost Tracking by Feature

Add an x-proxyllm-tag header to group costs by feature, team, or customer. See exactly which part of your app is burning through tokens.

OpenAI-Compatible API

Drop-in replacement for the OpenAI API. Works with the official SDKs for Python, Node, Go, and anything else that speaks the chat completions format.

Usage Dashboard

Real-time monitoring of requests, tokens, costs, cache hit rates, and model distribution. Filterable by tag, model, and time range.

Rate Limiting & Plans

Built-in usage tiers with per-minute and per-day rate limits. Upgrade plans in one click. No surprise bills.

Two lines to integrate

Works with the OpenAI SDK, the Anthropic SDK, LangChain, LlamaIndex, or any compatible HTTP client. Here's the Node.js example:

Before

app.ts

import OpenAI from "openai"

const client = new OpenAI({
  apiKey: "sk-...your-key",
})

const res = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ ]
    role: "user",
    content: "Hello!"
  }],
})

After — with ProxyLLM

app.ts

import OpenAI from "openai"

const client = new OpenAI({
  baseURL: "https://api.proxyllm.dev/v1",
  apiKey: "pl_your_api_key",
})

const res = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ ]
    role: "user",
    content: "Hello!"
  }],
})

Using Claude? Point the Anthropic SDK at https://api.proxyllm.dev and call /v1/messagesthe same way — one workspace key, both SDKs, shared cache. See the Anthropic SDK docs.

Simple, transparent pricing

Start free. Upgrade when you need more. No hidden fees and no per-token markup — managed plans include a monthly usage allowance (requests + a fair-use budget); BYOK runs uncapped on your own provider key. See plan limits.

Free

For side projects and experimentation.

$0forever

10,000 requests/mo

gpt-4o-mini model
Semantic caching
1 API key
7-day log retention

Start Free

Pro

Managed cross-format cache + single bill — or BYOK any model at 0% markup.

$29/month

50,000 requests/mo

Everything in Free
Smart model routing
Cost tagging (x-proxyllm-tag)
BYOK: any model on your own provider key
5 API keys
30-day log retention
Email support (best-effort)

Get Started

Scale

For teams running AI at scale.

$99/month

500,000 requests/mo

Everything in Pro
10× monthly quota at $0.0002/req
Custom routing rules
Webhook alerts
Unlimited API keys
90-day log retention
Priority email support

Get Started

Will Pro pay for itself?

Pro's value is the cache. Bringing your own provider key? Estimate how much it saves on your provider bill versus the $29/mo fee.

Model (BYOK on your own key)Requests / monthAvg tokens / request (~70% input / 30% output)Cache hit rate: 30%

Estimated cache savings on your provider bill

$63/mo

At these numbers Pro pays for itself — about $34/mo net after the $29 fee, before counting the dashboard, routing, and cost analytics.

Estimate using provider list prices (verified June 2026); your negotiated rates and prompt mix will differ. ProxyLLM measures your real cache-hit rate per request, so you can check this against actual usage. Managed gpt-4o-mini is different: a flat $29/mo for 50,000 requests with cache + dashboard, with no per-token cost to you — the calculator above is for BYOK on your own models.

How ProxyLLM compares

	ProxyLLM	Helicone	Portkey	Vercel AI Gateway
Entry price	$29/mo Pro	$79/mo Pro	$49/mo	Free + 0% markup
Semantic cache	Yes (cross-format)	Beta	Plugin	No
BYOK (zero markup)	Pro/Scale	Yes	Yes (BYOK-only)	Yes
Anthropic /v1/messages	Native	Native	Native	Native
Single bill (managed)	Yes	Token + Helicone	No	Token + Vercel
Active development	Yes	Maintenance only(acq. by Mintlify, Mar 2026)	Yes	Yes

ProxyLLM's pitch: managed cross-format semantic cache, single bill, lowest entry price — with BYOK on Pro/Scale for zero-markup heavy users.