Completions (Legacy)
Create text completions using the legacy prompt-based API. This endpoint translates requests into the chat completions pipeline internally, giving you access to the full Xantly routing engine while ma
Create text completions using the legacy prompt-based API. This endpoint translates requests into the chat completions pipeline internally, giving you access to the full Xantly routing engine while maintaining backward compatibility.
- POST
/v1/completions - Auth:
Authorization: Bearer <token> - Drop-in compatible with the OpenAI legacy Completions API.
Note: This is a legacy endpoint. For new integrations, use Chat Completions instead — it supports the same models with a richer feature set (tool calling, multimodal input, structured output).
Quick start
curl -sS https://api.xantly.com/v1/completions \
-H "Authorization: Bearer $XANTLY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "auto",
"prompt": "The capital of France is"
}'Request body
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model slug or "auto" for intelligent routing. |
prompt | string | array<string> | No | The prompt text. Multiple strings are joined with newlines. Defaults to empty string. |
max_tokens | integer | No | Maximum tokens to generate. |
temperature | number | No | Sampling temperature (0.0–2.0). |
top_p | number | No | Nucleus sampling parameter. |
n | integer | No | Number of completions to generate. |
stream | boolean | No | Not supported — returns a validation error. Use /v1/chat/completions for streaming. |
logprobs | integer | No | Include log probabilities on the most likely tokens. |
echo | boolean | No | Echo back the prompt in addition to the completion. |
stop | string | array<string> | No | Stop sequences. |
presence_penalty | number | No | Penalize new tokens based on presence in text so far (-2.0–2.0). |
frequency_penalty | number | No | Penalize new tokens based on frequency in text so far (-2.0–2.0). |
best_of | integer | No | Accepted for compatibility. |
suffix | string | No | Accepted for compatibility. |
user | string | No | End-user identifier. |
Response body
{
"id": "chatcmpl-abc123",
"object": "text_completion",
"created": 1741400000,
"model": "deepseek-chat",
"choices": [
{
"text": " Paris, which is also the largest city in France.",
"index": 0,
"logprobs": null,
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 7,
"completion_tokens": 12,
"total_tokens": 19
}
}| Field | Type | Description |
|---|---|---|
id | string | Unique completion identifier. |
object | string | Always "text_completion". |
created | integer | Unix epoch timestamp. |
model | string | Model that generated the completion. |
choices | array | One choice per completion. |
choices[].text | string | Generated text. If echo is true, includes the prompt. |
choices[].index | integer | Choice index (0-based). |
choices[].logprobs | object | null | Log probabilities if requested. |
choices[].finish_reason | string | "stop", "length", etc. |
usage | object | Token counts for billing. |
Code examples
import os
from openai import OpenAI
client = OpenAI(
api_key=os.environ["XANTLY_API_KEY"],
base_url="https://api.xantly.com/v1",
)
response = client.completions.create(
model="auto",
prompt="Once upon a time in a land far away,",
max_tokens=50,
)
print(response.choices[0].text)import OpenAI from "openai";
const client = new OpenAI({
apiKey: process.env.XANTLY_API_KEY,
baseURL: "https://api.xantly.com/v1",
});
const response = await client.completions.create({
model: "auto",
prompt: "Once upon a time in a land far away,",
max_tokens: 50,
});
console.log(response.choices[0].text);Errors
| HTTP | error.type | Typical trigger |
|---|---|---|
400 | invalid_request_error | stream: true (not supported), missing model. |
401 | authentication_error | Missing or invalid Bearer token. |
429 | rate_limit_error | Rate limit exceeded. |
402 | billing_error | Token quota or budget exceeded. |
500 | internal_error | Provider error or internal failure. |
Next steps
- Chat Completions — Recommended endpoint for new integrations
- Models — List available models
Responses API
The modern OpenAI Responses API endpoint. Clients that default to /v1/responses (including newer OpenAI SDK versions) work out of the box through the Xantly gateway.
Embeddings
Create vector embeddings for text. Embeddings are dense numerical representations of text useful for semantic search, clustering, classification, and retrieval-augmented generation (RAG).