Plan limits

Full matrix of what each plan ships. Cross-reference with the pricing page for current prices.

Feature	Free	Pro	Scale
Monthly requests	10,000	50,000	500,000
Per-minute burst	10	100	1,000
Per-day burst	500	5,000	50,000
Allowed models (managed)	`gpt-4o-mini`	`gpt-4o-mini`	`gpt-4o-mini`
Allowed models (BYOK)	—	any model on your provider key	any model on your provider key
API keys	1 (primary)	5	unlimited
Log retention	7 days	30 days	90 days
Cache TTL	24h	72h	168h (7d)
Semantic vector pool	1,000	10,000	unlimited
Smart routing	basic	priority	priority + custom rules
Advanced analytics	—	yes	yes
Webhook alerts	—	—	yes

About the per-tier managed-model whitelist

Each plan's managed mode limits the models we forward to a cost-bounded whitelist (see table above). Any other model returns 403 plan_limit_error before the upstream call is made. This is a hard cap to bound our upstream cost exposure per plan.

If you set workspace.default_model to a non-whitelisted model and then omit modelfrom your request payload, you'll still get the 403 — with a message pointing you to the dashboard so you can fix the default.

Need a different model? On Pro and Scale, enable BYOK and the per-tier whitelist no longer applies — your provider key, your bill, any model your provider supports.

Managed monthly usage allowance

In managed mode we pay the upstream provider on your behalf, so each plan's monthly allowance is bounded by both the request count above anda fair-use upstream-cost budget — whichever you reach first. Most workloads hit the request count; very large prompts/outputs can reach the cost budget sooner. When you hit either limit, requests return 429 with a message to upgrade or switch to BYOK.

BYOK has no such budget — you pay your provider directly, so on Pro/Scale you can run unlimited managed-feature traffic (cache, dashboard, routing) on your own key. If predictable high volume matters, BYOK is the path.

About basic vs priority routing

Basic routing (Free): short non-code prompts go to gpt-4o-mini, everything else uses the workspace default model. Priority routing (Pro/Scale) adds more granular signals (complexity heuristics, length thresholds). Scale also unlocks custom routing rules via POST /v1/workspace/routing-rules — full reference in the dashboard.