Billing & Quotas

Xantly enforces per-organization token quotas and optional monthly spend budgets to prevent unexpected costs. Both systems are checked before each request is processed.

Token quotas by plan

Plan	Monthly token quota	Quota enforcement
Free	50,000 tokens	Hard limit (returns 402)
Pro	5,000,000 tokens	Hard limit (returns 402)
Enterprise	Unlimited	Budget-only

Token counts are tracked against actual prompt_tokens + completion_tokens reported by each provider.

Monthly budget limits

In addition to token quotas, you can set a monthly spend cap in USD from the dashboard:

Budget type	Applies to
`monthly_budget_usd`	All requests (general)
`voice_monthly_budget_usd`	Voice API requests (`/v1/voice/*`)
`maker_monthly_budget_usd`	Reliability/maker API requests

If a category budget is not set, the general monthly_budget_usd applies as a fallback. If no budget is set at all, only the token quota applies.

402 Payment Required

When a request would exceed your token quota or monthly budget, the gateway returns 402 Payment Required:

{
  "error": {
    "message": "Free tier 50,000 token monthly quota exceeded. Upgrade your plan or add credits to continue.",
    "type": "billing_error",
    "code": "quota_exceeded",
    "top_up_url": "https://app.xantly.com/dashboard/billing",
    "suggested_amounts": [10, 25, 50, 100]
  }
}

Field	Description
`error.code`	`"quota_exceeded"` for token quota; `"budget_exceeded"` for spend limit
`error.top_up_url`	Dashboard URL to upgrade or add credits
`error.suggested_amounts`	Suggested top-up amounts in USD

Credit balance

Pre-paid credits allow requests to proceed past token quota and budget walls:

Credits are denominated in USD cents (credit_balance_cents)
When your balance is positive, requests are allowed even if quota or budget is exceeded
Credits are consumed post-request when actual token usage is known
Top up credits from the billing dashboard

Soft-limit warning headers

When you approach your token quota (≥ 90% used), responses include warning headers:

Header	Value	Description
`x-budget-warning`	e.g. `"94%"`	Percentage of quota consumed this month
`x-budget-remaining`	e.g. `"3000"`	Tokens remaining before hard limit

Use these headers in production to alert your team or automatically trigger an upgrade before requests start failing.

response = httpx.post("https://api.xantly.com/v1/chat/completions", ...)

warning = response.headers.get("x-budget-warning")
remaining = response.headers.get("x-budget-remaining")

if warning:
    print(f"Warning: {warning} of quota used, {remaining} tokens remaining")

Subscription cancellation

When a subscription is cancelled, the gateway returns 402 with code "subscription_inactive" for all inference requests for a brief period. Resubscribing immediately restores access.

Cost visibility per response

Every response includes cost fields in xantly_metadata:

Field	Description
`cost_usd`	Actual cost for this request in USD
`baseline_cost_usd`	What the same request would cost on GPT-4o
`savings_usd`	Cost saved vs. GPT-4o baseline
`savings_pct`	Savings as a percentage of baseline
`cost_attribution`	`"xantly"` (platform keys) or `"byok"` (your own API key)

Best practices

Set a monthly budget cap in the dashboard as a safety net against runaway usage.
Monitor x-budget-warning headers in production — add alerting at 80% so you can act before the hard limit.
Upgrade or add credits proactively — the credit balance system lets you continue past quota without interruption.
Enable caching (xantly.enable_cache: true, default) — cache hits consume zero tokens.
Use BYOK for high-volume workloads — routing through your own API keys doesn't count against Xantly's platform quota.

Next steps

Rate Limits — RPM and TPM rate limits (separate from token quotas)
Bring Your Own Key — Route requests through your own provider keys
Chat Completions — xantly_metadata.cost_usd and savings fields

On this page