XantlyANTLY
API Reference

Billing & Quotas

Xantly enforces per-organization token quotas and optional monthly spend budgets to prevent unexpected costs. Both systems are checked before each request is processed.

Xantly enforces per-organization token quotas and optional monthly spend budgets to prevent unexpected costs. Both systems are checked before each request is processed.


Token quotas by plan

PlanMonthly token quotaQuota enforcement
Free50,000 tokensHard limit (returns 402)
Pro5,000,000 tokensHard limit (returns 402)
EnterpriseUnlimitedBudget-only

Token counts are tracked against actual prompt_tokens + completion_tokens reported by each provider.


Monthly budget limits

In addition to token quotas, you can set a monthly spend cap in USD from the dashboard:

Budget typeApplies to
monthly_budget_usdAll requests (general)
voice_monthly_budget_usdVoice API requests (/v1/voice/*)
maker_monthly_budget_usdReliability/maker API requests

If a category budget is not set, the general monthly_budget_usd applies as a fallback. If no budget is set at all, only the token quota applies.


402 Payment Required

When a request would exceed your token quota or monthly budget, the gateway returns 402 Payment Required:

{
  "error": {
    "message": "Free tier 50,000 token monthly quota exceeded. Upgrade your plan or add credits to continue.",
    "type": "billing_error",
    "code": "quota_exceeded",
    "top_up_url": "https://app.xantly.com/dashboard/billing",
    "suggested_amounts": [10, 25, 50, 100]
  }
}
FieldDescription
error.code"quota_exceeded" for token quota; "budget_exceeded" for spend limit
error.top_up_urlDashboard URL to upgrade or add credits
error.suggested_amountsSuggested top-up amounts in USD

Credit balance

Pre-paid credits allow requests to proceed past token quota and budget walls:

  • Credits are denominated in USD cents (credit_balance_cents)
  • When your balance is positive, requests are allowed even if quota or budget is exceeded
  • Credits are consumed post-request when actual token usage is known
  • Top up credits from the billing dashboard

Soft-limit warning headers

When you approach your token quota (≥ 90% used), responses include warning headers:

HeaderValueDescription
x-budget-warninge.g. "94%"Percentage of quota consumed this month
x-budget-remaininge.g. "3000"Tokens remaining before hard limit

Use these headers in production to alert your team or automatically trigger an upgrade before requests start failing.

response = httpx.post("https://api.xantly.com/v1/chat/completions", ...)

warning = response.headers.get("x-budget-warning")
remaining = response.headers.get("x-budget-remaining")

if warning:
    print(f"Warning: {warning} of quota used, {remaining} tokens remaining")

Subscription cancellation

When a subscription is cancelled, the gateway returns 402 with code "subscription_inactive" for all inference requests for a brief period. Resubscribing immediately restores access.


Cost visibility per response

Every response includes cost fields in xantly_metadata:

FieldDescription
cost_usdActual cost for this request in USD
baseline_cost_usdWhat the same request would cost on GPT-4o
savings_usdCost saved vs. GPT-4o baseline
savings_pctSavings as a percentage of baseline
cost_attribution"xantly" (platform keys) or "byok" (your own API key)

Best practices

  1. Set a monthly budget cap in the dashboard as a safety net against runaway usage.
  2. Monitor x-budget-warning headers in production — add alerting at 80% so you can act before the hard limit.
  3. Upgrade or add credits proactively — the credit balance system lets you continue past quota without interruption.
  4. Enable caching (xantly.enable_cache: true, default) — cache hits consume zero tokens.
  5. Use BYOK for high-volume workloads — routing through your own API keys doesn't count against Xantly's platform quota.

Next steps

On this page