Responses API
The modern OpenAI Responses API endpoint. Clients that default to /v1/responses (including newer OpenAI SDK versions) work out of the box through the Xantly gateway.
The modern OpenAI Responses API endpoint. Clients that default to /v1/responses (including newer OpenAI SDK versions) work out of the box through the Xantly gateway.
- POST
/v1/responses - Auth:
Authorization: Bearer <token> - Drop-in compatible with the OpenAI Responses API.
How it works: Xantly translates Responses API requests into chat completions internally, routes through the full gateway pipeline (smart routing, caching, verification), and converts the response back to the Responses format. You get all Xantly features with zero code changes.
Quick start
curl -sS https://api.xantly.com/v1/responses \
-H "Authorization: Bearer $XANTLY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"input": "Explain quantum computing in one sentence."
}'Request body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model slug or "auto" for intelligent routing. |
input | string | array | object | Yes | Input content — string for simple text, array for multi-turn conversation items, or object for single input item. |
instructions | string | No | System instructions (mapped to system message). |
stream | boolean | No | Enable SSE streaming. Default false. |
temperature | number | No | Sampling temperature. |
top_p | number | No | Nucleus sampling parameter. |
max_output_tokens | integer | No | Maximum tokens to generate. |
tools | array | No | Function tool definitions. Only type: "function" is currently supported. |
tool_choice | string | object | No | Tool selection strategy. |
parallel_tool_calls | boolean | No | Allow parallel function calls. |
text | object | No | Text output configuration including format for structured output. |
text.format | object | No | Supports {"type": "json_object"} and {"type": "json_schema", ...}. |
user | string | No | End-user identifier. |
metadata | object | No | Free-form metadata map. |
service_tier | string | No | Service tier hint. |
reasoning | object | No | Reasoning configuration — {"effort": "low|medium|high"}. |
store | boolean | No | Accepted for compatibility. |
previous_response_id | string | No | Accepted for compatibility. |
Input items
When input is an array, each item can be:
| Item type | Description |
|---|---|
{"role": "system", "content": "..."} | System message. |
{"role": "user", "content": "..."} | User message (string or array of content parts). |
{"role": "assistant", "content": [...]} | Assistant message. |
{"type": "function_call", ...} | Prior function call for multi-turn tool use. |
{"type": "function_call_output", ...} | Tool output for multi-turn tool use. |
Content parts
User content arrays support:
{"type": "input_text", "text": "..."}— text input{"type": "input_image", "image_url": "..."}— image input (URL)
Response body (non-streaming)
{
"id": "chatcmpl-abc123",
"created_at": 1741400000,
"model": "deepseek-chat",
"object": "response",
"output": [
{
"type": "message",
"role": "assistant",
"id": "msg_abc123",
"content": [
{
"type": "output_text",
"text": "Quantum computing uses quantum bits...",
"annotations": []
}
]
}
],
"usage": {
"input_tokens": 15,
"output_tokens": 42,
"input_tokens_details": { "cached_tokens": 0 },
"output_tokens_details": { "reasoning_tokens": 0 }
}
}Output items
| Type | Description |
|---|---|
message | Text response with content[].type = "output_text". |
function_call | Function call with name, arguments, and call_id. |
Streaming
When stream: true, the endpoint returns Server-Sent Events in the Responses API format:
| Event type | Description |
|---|---|
response.created | Response object created. |
response.output_item.added | New output item started. |
response.content_part.added | Content part started. |
response.output_text.delta | Text chunk. |
response.output_text.done | Text complete. |
response.function_call_arguments.delta | Function call arguments chunk. |
response.output_item.done | Output item complete. |
response.completed | Full response with usage. |
Code examples
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["XANTLY_API_KEY"],
base_url="https://api.xantly.com/v1",
)
response = client.responses.create(
model="auto",
input="What is the speed of light?",
)
print(response.output[0].content[0].text)import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.XANTLY_API_KEY,
baseURL: "https://api.xantly.com/v1",
});
const response = await client.responses.create({
model: "auto",
input: "What is the speed of light?",
});
console.log(response.output[0].content[0].text);With tools
response = client.responses.create(
model="auto",
input="What's the weather in Tokyo?",
tools=[{
"type": "function",
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"}
},
"required": ["city"]
}
}],
)Errors
| HTTP | error.type | Typical trigger |
|---|---|---|
400 | invalid_request_error | Unsupported tool type, empty input, invalid content part. |
401 | authentication_error | Missing or invalid Bearer token. |
429 | rate_limit_error | Rate limit exceeded. |
500 | internal_error | Provider error or internal failure. |
Next steps
- Chat Completions — Alternative endpoint with full Xantly orchestration controls
- Streaming Responses — SSE format guide
- Models — List available models
Chat Completions
Create completions with OpenAI-compatible request/response shapes plus optional Xantly orchestration controls.
Completions (Legacy)
Create text completions using the legacy prompt-based API. This endpoint translates requests into the chat completions pipeline internally, giving you access to the full Xantly routing engine while ma