Quickstart
Go from API key to working request in under 60 seconds.
Go from API key to working request in under 60 seconds.
Prerequisites
- An active Xantly API key (
sk-...). Get your API key → curl, Python 3.8+, or Node.js 18+
Step 1 — Set your API key
Store your API key in an environment variable so it stays out of source code.
export XANTLY_API_KEY="sk-your-key-here"Step 2 — Send your first request
A minimal chat completion. Use model: "auto" and let the gateway pick the best model for you.
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["XANTLY_API_KEY"],
base_url="https://api.xantly.com/v1",
)
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "What is a vector database? One sentence."}],
)
print(response.choices[0].message.content)import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.XANTLY_API_KEY,
baseURL: "https://api.xantly.com/v1",
});
const response = await client.chat.completions.create({
model: "auto",
messages: [{ role: "user", content: "What is a vector database? One sentence." }],
});
console.log(response.choices[0].message.content);curl -sS https://api.xantly.com/v1/chat/completions \
-H "Authorization: Bearer $XANTLY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"messages": [
{"role": "user", "content": "What is a vector database? One sentence."}
]
}'Step 3 — Read the response
The response is 100% OpenAI-compatible. Any code that parses OpenAI responses works unchanged.
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"model": "deepseek-chat",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "A vector database stores high-dimensional embeddings and retrieves them via similarity search."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 18,
"completion_tokens": 22,
"total_tokens": 40
}
}Step 4 — Inspect routing metadata
Xantly adds x-xantly-* headers for routing transparency. Use them to inspect how your request was handled.
curl -i -sS https://api.xantly.com/v1/chat/completions \
-H "Authorization: Bearer $XANTLY_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"auto","messages":[{"role":"user","content":"Hello"}]}' \
2>&1 | grep -i "x-xantly-"Example output:
x-xantly-tier-used: T2
x-xantly-lane-used: smart
x-xantly-provider: deepseek
x-xantly-cache-hit: false
x-xantly-cost-usd: 0.00032x-xantly-cost-usd is present when usage/cost can be computed.
Enable streaming
Add stream: true to receive Server-Sent Events (chat.completion.chunk).
Current behavior note:
- Non-voice streaming is SSE-compatible but may return a compact sequence (role chunk + content chunk +
[DONE]) rather than token-by-token chunks. stream_options.include_usageis forwarded where supported, but a terminal usage chunk is not guaranteed.
stream = client.chat.completions.create(
model="auto",
stream=True,
stream_options={"include_usage": True},
messages=[{"role": "user", "content": "Write a haiku about APIs."}],
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")const stream = await client.chat.completions.create({
model: "auto",
stream: true,
stream_options: { include_usage: true },
messages: [{ role: "user", content: "Write a haiku about APIs." }],
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || "");
}curl -N https://api.xantly.com/v1/chat/completions \
-H "Authorization: Bearer $XANTLY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"stream": true,
"messages": [{"role": "user", "content": "Write a haiku about APIs."}]
}'Migrating from OpenAI
If you already use the OpenAI SDK, migration is a single-line change:
client = OpenAI(
- api_key=os.environ["OPENAI_API_KEY"],
+ api_key=os.environ["XANTLY_API_KEY"],
+ base_url="https://api.xantly.com/v1",
)All standard parameters (model, messages, temperature, max_tokens, tools, response_format) work identically.
Voice quickstart
Voice endpoints use the same API key and show up on the same invoice as chat. There is no separate subscription — if you have an API key, you can call voice.
Transcribe audio (STT)
curl -X POST https://api.xantly.com/v1/voice/transcribe \
-H "Authorization: Bearer $XANTLY_API_KEY" \
-F "audio=@input.wav" \
-F "language=en" \
-F "stt_model=groq/whisper-large-v3-turbo"Synthesize speech (TTS)
curl -X POST https://api.xantly.com/v1/voice/synthesize \
-H "Authorization: Bearer $XANTLY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Hello from Xantly voice.",
"voice": "alloy",
"model": "openai/gpt-4o-mini-tts",
"output_format": "pcm_16000"
}' \
--output response.pcmFull voice pipeline (audio → audio)
curl -X POST https://api.xantly.com/v1/voice/chat \
-H "Authorization: Bearer $XANTLY_API_KEY" \
-F "audio=@input.wav" \
-F "stt_model=groq/whisper-large-v3-turbo" \
-F "tts_model=elevenlabs/eleven_flash_v2_5" \
--output reply.pcmOr the header shortcut (OpenAI SDK compatible)
client = openai.OpenAI(api_key=os.environ["XANTLY_API_KEY"], base_url="https://api.xantly.com/v1")
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": transcript_text}],
extra_headers={"x-xantly-voice": "true"},
)Free tier includes a one-time 3-minute voice demo. See the Voice Models Catalog for all 30+ models and the Voice Billing reference for pricing details.
What's next?
- Chat Completions Reference — Full endpoint reference with all parameters
- Authentication — Secure your API key integration
- Cost-Optimized Routing — Tune cost, latency, and quality tradeoffs
- Multi-Agent Orchestration — Build agentic workflows