ProxyLLM is a drop-in proxy that sits between your app and AI providers. Semantic caching, smart model routing, and a live cost dashboard — saving you 20–40% on every request.
// Before — direct to OpenAI
const openai = new OpenAI()
// After — through ProxyLLM
const openai = new OpenAI({
baseURL: "https://api.proxyllm.dev/v1",
apiKey: "rw_your_api_key",
})No SDK to install. No code to refactor. Just change two lines.
Swap your baseURL and API key. Your existing OpenAI, Anthropic, or any LLM SDK works as-is.
ProxyLLM caches semantically similar queries, routes simple requests to cheaper models, and logs every call.
Watch your costs drop in real time. Tag requests by feature to see exactly where your money goes.
One proxy between your app and AI providers. Full visibility, automatic savings, zero code changes beyond the initial setup.
Save 20-40% on repeated queries. We match semantically similar prompts — not just exact strings — so "What's the weather?" and "Tell me the weather" share one cached response.
Auto-route simple queries to cheaper models. Classification and extraction tasks go to GPT-4o Mini; complex reasoning stays on GPT-4o. You set the rules.
Add an x-proxyllm-tag header to group costs by feature, team, or customer. See exactly which part of your app is burning through tokens.
Drop-in replacement for the OpenAI API. Works with the official SDKs for Python, Node, Go, and anything else that speaks the chat completions format.
Real-time monitoring of requests, tokens, costs, cache hit rates, and model distribution. Filterable by tag, model, and time range.
Built-in usage tiers with per-minute and per-day rate limits. Upgrade plans in one click. No surprise bills.
Works with the OpenAI SDK, Anthropic SDK, LangChain, LlamaIndex, or any HTTP client. Here's the Node.js example:
import OpenAI from "openai"
const client = new OpenAI({
apiKey: "sk-...your-key",
})
const res = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ ]
role: "user",
content: "Hello!"
}],
})import OpenAI from "openai"
const client = new OpenAI({
baseURL: "https://api.proxyllm.dev/v1",
apiKey: "pl_your_api_key",
})
const res = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ ]
role: "user",
content: "Hello!"
}],
})Start free. Upgrade when you need more requests. No hidden fees, no per-token charges beyond your AI provider's costs.
For side projects and experimentation.
For growing apps with real traffic.
For teams running AI at scale.