Voice Billing
Voice requests use the same API key, the same budget pool, and the same monthly invoice as chat. What differs is the pricing unit (minutes and characters, not tokens) and the addition of a per-minute
title: Voice Billing description: How voice requests are priced, quota-enforced, and surfaced on invoices.
Voice requests use the same API key, the same budget pool, and the same monthly invoice as chat. What differs is the pricing unit (minutes and characters, not tokens) and the addition of a per-minute platform fee.
This page documents how Xantly bills voice, enforces quotas, and surfaces costs back to you.
Cost breakdown
A single voice turn is billed as the sum of up to four components:
| Component | Pricing model | Where the cost comes from |
|---|---|---|
| Speech-to-Text (STT) | Per minute of input audio | Provider passthrough (Whisper, Deepgram Nova, Groq Whisper, etc.) |
| LLM inference | Per token (same as chat) | The model slug dispatched inside the voice pipeline |
| Text-to-Speech (TTS) | Per 1M characters | Provider passthrough (OpenAI TTS, ElevenLabs, Deepgram Aura, etc.) |
| Platform fee | Per minute of audio | Xantly's orchestration, caching, and routing layer |
The first three are pure provider passthrough — Xantly bills exactly what the underlying provider charged. The platform fee is where Xantly earns revenue and is the only tunable margin lever.
Chat models called inside the voice pipeline (e.g. groq/llama-3.3-70b selected by BaRP for low-latency fast-lane turns) are priced per token like any other chat completion.
Plan quotas
| Plan | Monthly minutes | Concurrent sessions | Voice RPM | Platform fee/min |
|---|---|---|---|---|
| Free | 3 min lifetime (one-time demo) | 1 | 3 | $0.00 |
| Pro | 500 / month | 5 | 60 | $0.02 |
| Scale | 5,000 / month | 25 | 500 | $0.015 |
| Pay-As-You-Go | Unlimited (credit-bounded) | 5 | 30 | $0.025 |
Per-org overrides are available via org_settings:
voice_monthly_minutes_limit— override the monthly capvoice_concurrent_session_limit— override the concurrency limitvoice_rpm_limit— override the requests-per-minute limitvoice_platform_fee_per_min— override the platform fee (for custom enterprise deals)voice_markup_pct_override— uniform markup override on provider pass-through costsvoice_component_overrides— per-component (stt / tts / inference) markup override
Free tier minutes are lifetime, not monthly. Once used, they do not reset. Upgrade to Pro to get a monthly allocation.
Quota enforcement order
For every voice request, Xantly runs these checks in order. The first one that fails returns an error and the request does not count toward any other quota:
- Monthly budget cap — the same
budget:usage:{org_id}:general:{YYYY-MM}pool as chat. Returns402 Payment Requiredwhen exceeded. - Monthly voice minutes quota —
plan_voice_minutes_monthlyor the org override. Returns429 Too Many Requests("Voice audio minutes quota exceeded"). - Free tier lifetime minutes — only applies to the Free plan. Returns
429("Free voice demo limit reached"). - PAYG credit floor — PAYG accounts must have at least $0.05 credit balance to start a voice request. Returns
402 Payment Required. - Concurrent session limit —
plan_voice_concurrent_sessionsor the org override. Returns429("Voice concurrent session limit reached"). - Voice RPM (sliding window) — enforced by the rate limit middleware.
When a voice request fails mid-pipeline (e.g. STT succeeds, TTS fails), Xantly bills for the stages that completed via the partial cost accumulator. You are never charged for work the provider never did, but you are also never refunded for work that was successfully billed upstream.
Cost visibility headers
Every successful voice response includes detailed cost + routing metadata headers:
| Header | Example value | Meaning |
|---|---|---|
X-Xantly-Cost-USD | 0.00324 | Total customer charge for this request |
X-Xantly-STT-Provider | deepgram | Which STT provider actually ran |
X-Xantly-STT-Model | deepgram/nova-2 | Which STT model was dispatched |
X-Xantly-TTS-Provider | elevenlabs | Which TTS provider actually ran |
X-Xantly-TTS-Model | elevenlabs/eleven_flash_v2_5 | Which TTS model was dispatched |
X-Xantly-STT-Latency-Ms | 82.4 | STT stage latency |
X-Xantly-Inference-Latency-Ms | 147.2 | LLM inference latency |
X-Xantly-TTS-Latency-Ms | 54.1 | TTS stage latency |
X-Xantly-Model-Used | groq/llama-3.3-70b | The chat model that served inference inside the pipeline |
X-Xantly-Lane-Used | FastLane | Whether BaRP routed through the fast lane or delegation lane |
X-Xantly-Cache-Hit | true | true when the voice semantic cache served the response (inference cost = $0) |
Anomaly thresholds
Xantly runs automatic cost-anomaly detection on every voice request. If a request exceeds either of the thresholds below, a warning is logged to Mission Control for operator review. The thresholds are configurable at deploy time:
| Environment variable | Default | Meaning |
|---|---|---|
VOICE_ANOMALY_COST_PER_MIN_THRESHOLD | 0.66 | Maximum provider cost per minute of audio before firing an alert (3x premium stack expected max) |
VOICE_ANOMALY_SINGLE_REQUEST_THRESHOLD | 5.0 | Maximum provider cost for a single voice request before firing an alert |
A third sanity check — "STT completed in <10ms for >5 seconds of audio" — is always enabled and not tunable. It catches broken duration tracking in STT provider responses.
Stripe integration
Voice minutes are reported to Stripe Metered Billing on a 60-second interval loop from the xantly-api process. Each org with stripe_voice_sub_item_id set on org_settings gets its total voice minutes for the current calendar month reported with action=set (idempotent).
To wire up voice metered billing for production:
- Create a Stripe Product called "Xantly Voice Minutes" in your Stripe Dashboard.
- Create a Metered Price with unit = "minute" and the currency / usage aggregation of your choice.
- Set
STRIPE_VOICE_METERED_PRICE_ID=price_...in the xantly-api environment. - When a customer subscribes to Pro or Scale, Xantly automatically adds the voice metered price as a second subscription item and persists the resulting
subscription_item_idonorg_settings.stripe_voice_sub_item_id. - The 60-second sync loop picks up from there.
When the env var is unset, all voice metering code paths are no-ops and voice usage is tracked only in gateway_requests (still visible on internal invoices, just not pushed to Stripe).
Next steps
- Voice Agents — full voice API reference with curl examples
- Voice Models Catalog — all 30+ voice models and their pricing units
- Billing & Quotas — general chat billing reference
Voice Models Catalog
Xantly exposes 33 voice models across 6 providers. Use the sttmodel / ttsmodel parameters on the voice endpoints to pick a specific model, or let Xantly auto-route based on language and latency budget
Rate Limits
Xantly enforces per-organization rate limits using a distributed sliding window algorithm. Limits are applied per endpoint category and per minute.