ProxyLLM Documentation
ProxyLLM is an OpenAI-compatible proxy that adds semantic caching, smart model routing, and cost tracking on top of your LLM calls. Drop-in replacement — change two lines.
Two-line integration
Swap your existing OpenAI client's baseURL and apiKey:
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.proxyllm.dev/v1",
apiKey: "pl_your_api_key",
});
const res = await client.chat.completions.create({
model: "gpt-4o-mini",
messages: [{ role: "user", content: "Hello!" }],
});API keys
Keys are issued automatically on signup and look like pl_a1b2c3d4e5f67890abcdef1234567890 (35 chars, pl_ prefix + 32 hex). Manage them at app.proxyllm.dev/settings.
What ProxyLLM does for you
- Semantic cache hits across paraphrased prompts — not just exact-string matches.
- Smart routing: short, simple prompts go to a cheap model; long or code-heavy prompts stay on the default model.
- Cost attribution via the
x-proxyllm-tagheader so you can see which features in your app are burning tokens. - Per-minute, per-day, and monthly rate limits per plan, with
Retry-Afterheaders so your retry logic just works.
Reference
Each topic in the left sidebar covers one slice of the API surface. Start with Authentication if you've just signed up, or jump to Errors if you're debugging a non-200 response.