Now in public beta

Stop overpaying for LLM API calls

ProxyLLM is a drop-in proxy that sits between your app and AI providers. Semantic caching, smart model routing, and a live cost dashboard — saving you 20–40% on every request.

No credit card required 2-line integration
app.ts
// Before — direct to OpenAI
const openai = new OpenAI()

// After — through ProxyLLM
const openai = new OpenAI({
  baseURL: "https://api.proxyllm.dev/v1",
  apiKey: "rw_your_api_key",
})

Up and running in 2 minutes

No SDK to install. No code to refactor. Just change two lines.

01

Point your SDK to ProxyLLM

Swap your baseURL and API key. Your existing OpenAI, Anthropic, or any LLM SDK works as-is.

02

We cache, route & track

ProxyLLM caches semantically similar queries, routes simple requests to cheaper models, and logs every call.

03

See savings in your dashboard

Watch your costs drop in real time. Tag requests by feature to see exactly where your money goes.

Everything you need to control LLM costs

One proxy between your app and AI providers. Full visibility, automatic savings, zero code changes beyond the initial setup.

Semantic Caching

Save 20-40% on repeated queries. We match semantically similar prompts — not just exact strings — so "What's the weather?" and "Tell me the weather" share one cached response.

Smart Model Routing

Auto-route simple queries to cheaper models. Classification and extraction tasks go to GPT-4o Mini; complex reasoning stays on GPT-4o. You set the rules.

Cost Tracking by Feature

Add an x-proxyllm-tag header to group costs by feature, team, or customer. See exactly which part of your app is burning through tokens.

OpenAI-Compatible API

Drop-in replacement for the OpenAI API. Works with the official SDKs for Python, Node, Go, and anything else that speaks the chat completions format.

Usage Dashboard

Real-time monitoring of requests, tokens, costs, cache hit rates, and model distribution. Filterable by tag, model, and time range.

Rate Limiting & Plans

Built-in usage tiers with per-minute and per-day rate limits. Upgrade plans in one click. No surprise bills.

Two lines to integrate

Works with the OpenAI SDK, Anthropic SDK, LangChain, LlamaIndex, or any HTTP client. Here's the Node.js example:

Before
app.ts
import OpenAI from "openai"

const client = new OpenAI({
  apiKey: "sk-...your-key",
})

const res = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ ]
    role: "user",
    content: "Hello!"
  }],
})
After — with ProxyLLM
app.ts
import OpenAI from "openai"

const client = new OpenAI({
  baseURL: "https://api.proxyllm.dev/v1",
  apiKey: "pl_your_api_key",
})

const res = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ ]
    role: "user",
    content: "Hello!"
  }],
})

Simple, transparent pricing

Start free. Upgrade when you need more requests. No hidden fees, no per-token charges beyond your AI provider's costs.

Free

For side projects and experimentation.

$0forever
1,000 requests/mo
  • Semantic caching
  • 1 API key
  • Community support
  • 7-day log retention
Start Free

Scale

For teams running AI at scale.

$99/month
500,000 requests/mo
  • Everything in Pro
  • Custom routing rules
  • Unlimited API keys
  • 90-day log retention
  • Priority support
  • SSO & team management
Get Started