Responses API

The modern OpenAI Responses API endpoint. Clients that default to /v1/responses (including newer OpenAI SDK versions) work out of the box through the Xantly gateway.

The modern OpenAI Responses API endpoint. Clients that default to /v1/responses (including newer OpenAI SDK versions) work out of the box through the Xantly gateway.

POST /v1/responses
Auth: Authorization: Bearer <token>
Drop-in compatible with the OpenAI Responses API.

How it works: Xantly translates Responses API requests into chat completions internally, routes through the full gateway pipeline (smart routing, caching, verification), and converts the response back to the Responses format. You get all Xantly features with zero code changes.

Quick start

curl -sS https://api.xantly.com/v1/responses \
  -H "Authorization: Bearer $XANTLY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "input": "Explain quantum computing in one sentence."
  }'

Request body

Field	Type	Required	Description
`model`	`string`	Yes	Model slug or `"auto"` for intelligent routing.
`input`	`string \| array \| object`	Yes	Input content — string for simple text, array for multi-turn conversation items, or object for single input item.
`instructions`	`string`	No	System instructions (mapped to system message).
`stream`	`boolean`	No	Enable SSE streaming. Default `false`.
`temperature`	`number`	No	Sampling temperature.
`top_p`	`number`	No	Nucleus sampling parameter.
`max_output_tokens`	`integer`	No	Maximum tokens to generate.
`tools`	`array`	No	Function tool definitions. Only `type: "function"` is currently supported.
`tool_choice`	`string \| object`	No	Tool selection strategy.
`parallel_tool_calls`	`boolean`	No	Allow parallel function calls.
`text`	`object`	No	Text output configuration including `format` for structured output.
`text.format`	`object`	No	Supports `{"type": "json_object"}` and `{"type": "json_schema", ...}`.
`user`	`string`	No	End-user identifier.
`metadata`	`object`	No	Free-form metadata map.
`service_tier`	`string`	No	Service tier hint.
`reasoning`	`object`	No	Reasoning configuration — `{"effort": "low\|medium\|high"}`.
`store`	`boolean`	No	Accepted for compatibility.
`previous_response_id`	`string`	No	Accepted for compatibility.

Input items

When input is an array, each item can be:

Item type	Description
`{"role": "system", "content": "..."}`	System message.
`{"role": "user", "content": "..."}`	User message (string or array of content parts).
`{"role": "assistant", "content": [...]}`	Assistant message.
`{"type": "function_call", ...}`	Prior function call for multi-turn tool use.
`{"type": "function_call_output", ...}`	Tool output for multi-turn tool use.

Content parts

User content arrays support:

{"type": "input_text", "text": "..."} — text input
{"type": "input_image", "image_url": "..."} — image input (URL)

Response body (non-streaming)

{
  "id": "chatcmpl-abc123",
  "created_at": 1741400000,
  "model": "deepseek-chat",
  "object": "response",
  "output": [
    {
      "type": "message",
      "role": "assistant",
      "id": "msg_abc123",
      "content": [
        {
          "type": "output_text",
          "text": "Quantum computing uses quantum bits...",
          "annotations": []
        }
      ]
    }
  ],
  "usage": {
    "input_tokens": 15,
    "output_tokens": 42,
    "input_tokens_details": { "cached_tokens": 0 },
    "output_tokens_details": { "reasoning_tokens": 0 }
  }
}

Output items

Type	Description
`message`	Text response with `content[].type = "output_text"`.
`function_call`	Function call with `name`, `arguments`, and `call_id`.

Streaming

When stream: true, the endpoint returns Server-Sent Events in the Responses API format:

Event type	Description
`response.created`	Response object created.
`response.output_item.added`	New output item started.
`response.content_part.added`	Content part started.
`response.output_text.delta`	Text chunk.
`response.output_text.done`	Text complete.
`response.function_call_arguments.delta`	Function call arguments chunk.
`response.output_item.done`	Output item complete.
`response.completed`	Full response with usage.

Code examples

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XANTLY_API_KEY"],
    base_url="https://api.xantly.com/v1",
)

response = client.responses.create(
    model="auto",
    input="What is the speed of light?",
)
print(response.output[0].content[0].text)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.XANTLY_API_KEY,
  baseURL: "https://api.xantly.com/v1",
});

const response = await client.responses.create({
  model: "auto",
  input: "What is the speed of light?",
});
console.log(response.output[0].content[0].text);

With tools

response = client.responses.create(
    model="auto",
    input="What's the weather in Tokyo?",
    tools=[{
        "type": "function",
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string"}
            },
            "required": ["city"]
        }
    }],
)

Errors

HTTP	`error.type`	Typical trigger
`400`	`invalid_request_error`	Unsupported tool type, empty input, invalid content part.
`401`	`authentication_error`	Missing or invalid Bearer token.
`429`	`rate_limit_error`	Rate limit exceeded.
`500`	`internal_error`	Provider error or internal failure.

Next steps

Chat Completions — Alternative endpoint with full Xantly orchestration controls
Streaming Responses — SSE format guide
Models — List available models

On this page