Voice Billing

Voice requests use the same API key, the same budget pool, and the same monthly invoice as chat. What differs is the pricing unit (minutes and characters, not tokens) and the addition of a per-minute

title: Voice Billing description: How voice requests are priced, quota-enforced, and surfaced on invoices.

Voice requests use the same API key, the same budget pool, and the same monthly invoice as chat. What differs is the pricing unit (minutes and characters, not tokens) and the addition of a per-minute platform fee.

This page documents how Xantly bills voice, enforces quotas, and surfaces costs back to you.

Cost breakdown

A single voice turn is billed as the sum of up to four components:

Component	Pricing model	Where the cost comes from
Speech-to-Text (STT)	Per minute of input audio	Provider passthrough (Whisper, Deepgram Nova, Groq Whisper, etc.)
LLM inference	Per token (same as chat)	The model slug dispatched inside the voice pipeline
Text-to-Speech (TTS)	Per 1M characters	Provider passthrough (OpenAI TTS, ElevenLabs, Deepgram Aura, etc.)
Platform fee	Per minute of audio	Xantly's orchestration, caching, and routing layer

The first three are pure provider passthrough — Xantly bills exactly what the underlying provider charged. The platform fee is where Xantly earns revenue and is the only tunable margin lever.

Chat models called inside the voice pipeline (e.g. groq/llama-3.3-70b selected by BaRP for low-latency fast-lane turns) are priced per token like any other chat completion.

Plan quotas

Plan	Monthly minutes	Concurrent sessions	Voice RPM	Platform fee/min
Free	3 min lifetime (one-time demo)	1	3	$0.00
Pro	500 / month	5	60	$0.02
Scale	5,000 / month	25	500	$0.015
Pay-As-You-Go	Unlimited (credit-bounded)	5	30	$0.025

Per-org overrides are available via org_settings:

voice_monthly_minutes_limit — override the monthly cap
voice_concurrent_session_limit — override the concurrency limit
voice_rpm_limit — override the requests-per-minute limit
voice_platform_fee_per_min — override the platform fee (for custom enterprise deals)
voice_markup_pct_override — uniform markup override on provider pass-through costs
voice_component_overrides — per-component (stt / tts / inference) markup override

Free tier minutes are lifetime, not monthly. Once used, they do not reset. Upgrade to Pro to get a monthly allocation.

Quota enforcement order

For every voice request, Xantly runs these checks in order. The first one that fails returns an error and the request does not count toward any other quota:

Monthly budget cap — the same budget:usage:{org_id}:general:{YYYY-MM} pool as chat. Returns 402 Payment Required when exceeded.
Monthly voice minutes quota — plan_voice_minutes_monthly or the org override. Returns 429 Too Many Requests ("Voice audio minutes quota exceeded").
Free tier lifetime minutes — only applies to the Free plan. Returns 429 ("Free voice demo limit reached").
PAYG credit floor — PAYG accounts must have at least $0.05 credit balance to start a voice request. Returns 402 Payment Required.
Concurrent session limit — plan_voice_concurrent_sessions or the org override. Returns 429 ("Voice concurrent session limit reached").
Voice RPM (sliding window) — enforced by the rate limit middleware.

When a voice request fails mid-pipeline (e.g. STT succeeds, TTS fails), Xantly bills for the stages that completed via the partial cost accumulator. You are never charged for work the provider never did, but you are also never refunded for work that was successfully billed upstream.

Cost visibility headers

Every successful voice response includes detailed cost + routing metadata headers:

Header	Example value	Meaning
`X-Xantly-Cost-USD`	`0.00324`	Total customer charge for this request
`X-Xantly-STT-Provider`	`deepgram`	Which STT provider actually ran
`X-Xantly-STT-Model`	`deepgram/nova-2`	Which STT model was dispatched
`X-Xantly-TTS-Provider`	`elevenlabs`	Which TTS provider actually ran
`X-Xantly-TTS-Model`	`elevenlabs/eleven_flash_v2_5`	Which TTS model was dispatched
`X-Xantly-STT-Latency-Ms`	`82.4`	STT stage latency
`X-Xantly-Inference-Latency-Ms`	`147.2`	LLM inference latency
`X-Xantly-TTS-Latency-Ms`	`54.1`	TTS stage latency
`X-Xantly-Model-Used`	`groq/llama-3.3-70b`	The chat model that served inference inside the pipeline
`X-Xantly-Lane-Used`	`FastLane`	Whether BaRP routed through the fast lane or delegation lane
`X-Xantly-Cache-Hit`	`true`	`true` when the voice semantic cache served the response (inference cost = $0)

Anomaly thresholds

Xantly runs automatic cost-anomaly detection on every voice request. If a request exceeds either of the thresholds below, a warning is logged to Mission Control for operator review. The thresholds are configurable at deploy time:

Environment variable	Default	Meaning
`VOICE_ANOMALY_COST_PER_MIN_THRESHOLD`	`0.66`	Maximum provider cost per minute of audio before firing an alert (3x premium stack expected max)
`VOICE_ANOMALY_SINGLE_REQUEST_THRESHOLD`	`5.0`	Maximum provider cost for a single voice request before firing an alert

A third sanity check — "STT completed in <10ms for >5 seconds of audio" — is always enabled and not tunable. It catches broken duration tracking in STT provider responses.

Stripe integration

Voice minutes are reported to Stripe Metered Billing on a 60-second interval loop from the xantly-api process. Each org with stripe_voice_sub_item_id set on org_settings gets its total voice minutes for the current calendar month reported with action=set (idempotent).

To wire up voice metered billing for production:

Create a Stripe Product called "Xantly Voice Minutes" in your Stripe Dashboard.
Create a Metered Price with unit = "minute" and the currency / usage aggregation of your choice.
Set STRIPE_VOICE_METERED_PRICE_ID=price_... in the xantly-api environment.
When a customer subscribes to Pro or Scale, Xantly automatically adds the voice metered price as a second subscription item and persists the resulting subscription_item_id on org_settings.stripe_voice_sub_item_id.
The 60-second sync loop picks up from there.

When the env var is unset, all voice metering code paths are no-ops and voice usage is tracked only in gateway_requests (still visible on internal invoices, just not pushed to Stripe).

Next steps

Voice Agents — full voice API reference with curl examples
Voice Models Catalog — all 30+ voice models and their pricing units
Billing & Quotas — general chat billing reference

On this page