Billing & Quotas
Xantly enforces per-organization token quotas and optional monthly spend budgets to prevent unexpected costs. Both systems are checked before each request is processed.
Xantly enforces per-organization token quotas and optional monthly spend budgets to prevent unexpected costs. Both systems are checked before each request is processed.
Token quotas by plan
| Plan | Monthly token quota | Quota enforcement |
|---|---|---|
| Free | 50,000 tokens | Hard limit (returns 402) |
| Pro | 5,000,000 tokens | Hard limit (returns 402) |
| Enterprise | Unlimited | Budget-only |
Token counts are tracked against actual prompt_tokens + completion_tokens reported by each provider.
Monthly budget limits
In addition to token quotas, you can set a monthly spend cap in USD from the dashboard:
| Budget type | Applies to |
|---|---|
monthly_budget_usd | All requests (general) |
voice_monthly_budget_usd | Voice API requests (/v1/voice/*) |
maker_monthly_budget_usd | Reliability/maker API requests |
If a category budget is not set, the general monthly_budget_usd applies as a fallback. If no budget is set at all, only the token quota applies.
402 Payment Required
When a request would exceed your token quota or monthly budget, the gateway returns 402 Payment Required:
{
"error": {
"message": "Free tier 50,000 token monthly quota exceeded. Upgrade your plan or add credits to continue.",
"type": "billing_error",
"code": "quota_exceeded",
"top_up_url": "https://app.xantly.com/dashboard/billing",
"suggested_amounts": [10, 25, 50, 100]
}
}| Field | Description |
|---|---|
error.code | "quota_exceeded" for token quota; "budget_exceeded" for spend limit |
error.top_up_url | Dashboard URL to upgrade or add credits |
error.suggested_amounts | Suggested top-up amounts in USD |
Credit balance
Pre-paid credits allow requests to proceed past token quota and budget walls:
- Credits are denominated in USD cents (
credit_balance_cents) - When your balance is positive, requests are allowed even if quota or budget is exceeded
- Credits are consumed post-request when actual token usage is known
- Top up credits from the billing dashboard
Soft-limit warning headers
When you approach your token quota (≥ 90% used), responses include warning headers:
| Header | Value | Description |
|---|---|---|
x-budget-warning | e.g. "94%" | Percentage of quota consumed this month |
x-budget-remaining | e.g. "3000" | Tokens remaining before hard limit |
Use these headers in production to alert your team or automatically trigger an upgrade before requests start failing.
response = httpx.post("https://api.xantly.com/v1/chat/completions", ...)
warning = response.headers.get("x-budget-warning")
remaining = response.headers.get("x-budget-remaining")
if warning:
print(f"Warning: {warning} of quota used, {remaining} tokens remaining")Subscription cancellation
When a subscription is cancelled, the gateway returns 402 with code "subscription_inactive" for all inference requests for a brief period. Resubscribing immediately restores access.
Cost visibility per response
Every response includes cost fields in xantly_metadata:
| Field | Description |
|---|---|
cost_usd | Actual cost for this request in USD |
baseline_cost_usd | What the same request would cost on GPT-4o |
savings_usd | Cost saved vs. GPT-4o baseline |
savings_pct | Savings as a percentage of baseline |
cost_attribution | "xantly" (platform keys) or "byok" (your own API key) |
Best practices
- Set a monthly budget cap in the dashboard as a safety net against runaway usage.
- Monitor
x-budget-warningheaders in production — add alerting at 80% so you can act before the hard limit. - Upgrade or add credits proactively — the credit balance system lets you continue past quota without interruption.
- Enable caching (
xantly.enable_cache: true, default) — cache hits consume zero tokens. - Use BYOK for high-volume workloads — routing through your own API keys doesn't count against Xantly's platform quota.
Next steps
- Rate Limits — RPM and TPM rate limits (separate from token quotas)
- Bring Your Own Key — Route requests through your own provider keys
- Chat Completions —
xantly_metadata.cost_usdand savings fields