XantlyANTLY
API Reference

Responses API

The modern OpenAI Responses API endpoint. Clients that default to /v1/responses (including newer OpenAI SDK versions) work out of the box through the Xantly gateway.

The modern OpenAI Responses API endpoint. Clients that default to /v1/responses (including newer OpenAI SDK versions) work out of the box through the Xantly gateway.

  • POST /v1/responses
  • Auth: Authorization: Bearer <token>
  • Drop-in compatible with the OpenAI Responses API.

How it works: Xantly translates Responses API requests into chat completions internally, routes through the full gateway pipeline (smart routing, caching, verification), and converts the response back to the Responses format. You get all Xantly features with zero code changes.


Quick start

curl -sS https://api.xantly.com/v1/responses \
  -H "Authorization: Bearer $XANTLY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "input": "Explain quantum computing in one sentence."
  }'

Request body

FieldTypeRequiredDescription
modelstringYesModel slug or "auto" for intelligent routing.
inputstring | array | objectYesInput content — string for simple text, array for multi-turn conversation items, or object for single input item.
instructionsstringNoSystem instructions (mapped to system message).
streambooleanNoEnable SSE streaming. Default false.
temperaturenumberNoSampling temperature.
top_pnumberNoNucleus sampling parameter.
max_output_tokensintegerNoMaximum tokens to generate.
toolsarrayNoFunction tool definitions. Only type: "function" is currently supported.
tool_choicestring | objectNoTool selection strategy.
parallel_tool_callsbooleanNoAllow parallel function calls.
textobjectNoText output configuration including format for structured output.
text.formatobjectNoSupports {"type": "json_object"} and {"type": "json_schema", ...}.
userstringNoEnd-user identifier.
metadataobjectNoFree-form metadata map.
service_tierstringNoService tier hint.
reasoningobjectNoReasoning configuration — {"effort": "low|medium|high"}.
storebooleanNoAccepted for compatibility.
previous_response_idstringNoAccepted for compatibility.

Input items

When input is an array, each item can be:

Item typeDescription
{"role": "system", "content": "..."}System message.
{"role": "user", "content": "..."}User message (string or array of content parts).
{"role": "assistant", "content": [...]}Assistant message.
{"type": "function_call", ...}Prior function call for multi-turn tool use.
{"type": "function_call_output", ...}Tool output for multi-turn tool use.

Content parts

User content arrays support:

  • {"type": "input_text", "text": "..."} — text input
  • {"type": "input_image", "image_url": "..."} — image input (URL)

Response body (non-streaming)

{
  "id": "chatcmpl-abc123",
  "created_at": 1741400000,
  "model": "deepseek-chat",
  "object": "response",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "id": "msg_abc123",
      "content": [
        {
          "type": "output_text",
          "text": "Quantum computing uses quantum bits...",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 15,
    "output_tokens": 42,
    "input_tokens_details": { "cached_tokens": 0 },
    "output_tokens_details": { "reasoning_tokens": 0 }
  }
}

Output items

TypeDescription
messageText response with content[].type = "output_text".
function_callFunction call with name, arguments, and call_id.

Streaming

When stream: true, the endpoint returns Server-Sent Events in the Responses API format:

Event typeDescription
response.createdResponse object created.
response.output_item.addedNew output item started.
response.content_part.addedContent part started.
response.output_text.deltaText chunk.
response.output_text.doneText complete.
response.function_call_arguments.deltaFunction call arguments chunk.
response.output_item.doneOutput item complete.
response.completedFull response with usage.

Code examples

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XANTLY_API_KEY"],
    base_url="https://api.xantly.com/v1",
)

response = client.responses.create(
    model="auto",
    input="What is the speed of light?",
)
print(response.output[0].content[0].text)
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.XANTLY_API_KEY,
  baseURL: "https://api.xantly.com/v1",
});

const response = await client.responses.create({
  model: "auto",
  input: "What is the speed of light?",
});
console.log(response.output[0].content[0].text);

With tools

response = client.responses.create(
    model="auto",
    input="What's the weather in Tokyo?",
    tools=[{
        "type": "function",
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            },
            "required": ["city"]
        }
    }],
)

Errors

HTTPerror.typeTypical trigger
400invalid_request_errorUnsupported tool type, empty input, invalid content part.
401authentication_errorMissing or invalid Bearer token.
429rate_limit_errorRate limit exceeded.
500internal_errorProvider error or internal failure.

Next steps

On this page