ProxyLLM Documentation

ProxyLLM is an OpenAI-compatible proxy that adds semantic caching, smart model routing, and cost tracking on top of your LLM calls. Drop-in replacement — change two lines.

Two-line integration

Swap your existing OpenAI client's baseURL and apiKey:

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.proxyllm.dev/v1",
  apiKey: "pl_your_api_key",
});

const res = await client.chat.completions.create({
  model: "gpt-4o-mini",
  messages: [{ role: "user", content: "Hello!" }],
});

API keys

Keys are issued automatically on signup and look like pl_a1b2c3d4e5f67890abcdef1234567890 (35 chars, pl_ prefix + 32 hex). Manage them at app.proxyllm.dev/settings.

What ProxyLLM does for you

Semantic cache hits across paraphrased prompts — not just exact-string matches.
Smart routing: short, simple prompts go to a cheap model; long or code-heavy prompts stay on the default model.
Cost attribution via the x-proxyllm-tag header so you can see which features in your app are burning tokens.
Per-minute, per-day, and monthly rate limits per plan, with Retry-After headers so your retry logic just works.

Reference

Each topic in the left sidebar covers one slice of the API surface. Start with Authentication if you've just signed up, or jump to Errors if you're debugging a non-200 response.