XantlyANTLY
API Reference

Audio

Transcribe audio to text, translate audio to English, and generate speech from text. All endpoints proxy to OpenAI's audio APIs with automatic BYOK key resolution.

Transcribe audio to text, translate audio to English, and generate speech from text. All endpoints proxy to OpenAI's audio APIs with automatic BYOK key resolution.

  • POST /v1/audio/transcriptions — Speech-to-text (Whisper)
  • POST /v1/audio/translations — Translate audio to English (Whisper)
  • POST /v1/audio/speech — Text-to-speech
  • Auth: Authorization: Bearer <token>
  • Drop-in compatible with the OpenAI Audio API.

Transcriptions (Speech-to-Text)

Convert audio files to text using OpenAI's Whisper model.

Quick start

curl -sS https://api.xantly.com/v1/audio/transcriptions \
  -H "Authorization: Bearer $XANTLY_API_KEY" \
  -F file="@recording.mp3" \
  -F model="whisper-1"

Request body (multipart/form-data)

FieldTypeRequiredDescription
filebinaryYesAudio file (mp3, mp4, mpeg, mpga, m4a, wav, webm). Max 25 MB.
modelstringYesCurrently "whisper-1".
languagestringNoISO 639-1 language code (e.g. "en", "es"). Improves accuracy.
promptstringNoGuide the model's style or continue a previous segment.
response_formatstringNo"json" (default), "text", "srt", "verbose_json", "vtt".
temperaturenumberNoSampling temperature (0.01.0).

Response body

{
  "text": "Hello, this is a transcription of the audio file."
}

Translations (Audio to English)

Translate audio from any supported language into English text using Whisper.

Quick start

curl -sS https://api.xantly.com/v1/audio/translations \
  -H "Authorization: Bearer $XANTLY_API_KEY" \
  -F file="@german_audio.mp3" \
  -F model="whisper-1"

Request body (multipart/form-data)

FieldTypeRequiredDescription
filebinaryYesAudio file (mp3, mp4, mpeg, mpga, m4a, wav, webm). Max 25 MB.
modelstringYesCurrently "whisper-1".
promptstringNoGuide the model's style or continue a previous segment.
response_formatstringNo"json" (default), "text", "srt", "verbose_json", "vtt".
temperaturenumberNoSampling temperature (0.01.0).

Response body

{
  "text": "Hello, this is the translated text in English."
}

Note: Unlike transcriptions, translations always output English regardless of the source language. For same-language transcription, use /v1/audio/transcriptions instead.


Speech (Text-to-Speech)

Generate natural-sounding audio from text.

Quick start

curl -sS https://api.xantly.com/v1/audio/speech \
  -H "Authorization: Bearer $XANTLY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello world! This is a test of the text to speech API.",
    "voice": "alloy"
  }' \
  --output speech.mp3

Request body

FieldTypeRequiredDescription
modelstringYes"tts-1" or "tts-1-hd".
inputstringYesText to convert to speech. Max 4096 characters.
voicestringYesVoice to use: "alloy", "echo", "fable", "onyx", "nova", "shimmer".
response_formatstringNo"mp3" (default), "opus", "aac", "flac", "wav", "pcm".
speednumberNoSpeed multiplier (0.254.0). Default 1.0.

Response body

Returns raw audio bytes with the appropriate Content-Type header (e.g. audio/mpeg for mp3).


Code examples

Transcription

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XANTLY_API_KEY"],
    base_url="https://api.xantly.com/v1",
)

with open("recording.mp3", "rb") as audio_file:
    transcript = client.audio.transcriptions.create(
        model="whisper-1",
        file=audio_file,
    )
print(transcript.text)
import OpenAI from "openai";
import fs from "fs";

const client = new OpenAI({
  apiKey: process.env.XANTLY_API_KEY,
  baseURL: "https://api.xantly.com/v1",
});

const transcript = await client.audio.transcriptions.create({
  model: "whisper-1",
  file: fs.createReadStream("recording.mp3"),
});
console.log(transcript.text);

Speech generation

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Welcome to the Xantly platform!",
)

with open("output.mp3", "wb") as f:
    f.write(response.content)
const response = await client.audio.speech.create({
  model: "tts-1",
  voice: "alloy",
  input: "Welcome to the Xantly platform!",
});

const buffer = Buffer.from(await response.arrayBuffer());
fs.writeFileSync("output.mp3", buffer);

BYOK support

Audio endpoints automatically resolve your organization's BYOK OpenAI key. If no BYOK key is configured, the platform key is used. Both endpoints use a 120-second timeout for large audio files.


Errors

HTTPerror.typeTypical trigger
400invalid_request_errorInvalid multipart data, unsupported audio format.
401authentication_errorMissing or invalid Bearer token.
500provider_errorNo OpenAI API key configured, or upstream error.

Next steps

On this page