Completions (Legacy)

Create text completions using the legacy prompt-based API. This endpoint translates requests into the chat completions pipeline internally, giving you access to the full Xantly routing engine while ma

POST /v1/completions
Auth: Authorization: Bearer <token>
Drop-in compatible with the OpenAI legacy Completions API.

Note: This is a legacy endpoint. For new integrations, use Chat Completions instead — it supports the same models with a richer feature set (tool calling, multimodal input, structured output).

Quick start

curl -sS https://api.xantly.com/v1/completions \
  -H "Authorization: Bearer $XANTLY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "prompt": "The capital of France is"
  }'

Request body

Field	Type	Required	Description
`model`	`string`	Yes	Model slug or `"auto"` for intelligent routing.
`prompt`	`string \| array<string>`	No	The prompt text. Multiple strings are joined with newlines. Defaults to empty string.
`max_tokens`	`integer`	No	Maximum tokens to generate.
`temperature`	`number`	No	Sampling temperature (`0.0`–`2.0`).
`top_p`	`number`	No	Nucleus sampling parameter.
`n`	`integer`	No	Number of completions to generate.
`stream`	`boolean`	No	Not supported — returns a validation error. Use `/v1/chat/completions` for streaming.
`logprobs`	`integer`	No	Include log probabilities on the most likely tokens.
`echo`	`boolean`	No	Echo back the prompt in addition to the completion.
`stop`	`string \| array<string>`	No	Stop sequences.
`presence_penalty`	`number`	No	Penalize new tokens based on presence in text so far (`-2.0`–`2.0`).
`frequency_penalty`	`number`	No	Penalize new tokens based on frequency in text so far (`-2.0`–`2.0`).
`best_of`	`integer`	No	Accepted for compatibility.
`suffix`	`string`	No	Accepted for compatibility.
`user`	`string`	No	End-user identifier.

Response body

{
  "id": "chatcmpl-abc123",
  "object": "text_completion",
  "created": 1741400000,
  "model": "deepseek-chat",
  "choices": [
    {
      "text": " Paris, which is also the largest city in France.",
      "index": 0,
      "logprobs": null,
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 7,
    "completion_tokens": 12,
    "total_tokens": 19
  }
}

Field	Type	Description
`id`	`string`	Unique completion identifier.
`object`	`string`	Always `"text_completion"`.
`created`	`integer`	Unix epoch timestamp.
`model`	`string`	Model that generated the completion.
`choices`	`array`	One choice per completion.
`choices[].text`	`string`	Generated text. If `echo` is `true`, includes the prompt.
`choices[].index`	`integer`	Choice index (0-based).
`choices[].logprobs`	`object \| null`	Log probabilities if requested.
`choices[].finish_reason`	`string`	`"stop"`, `"length"`, etc.
`usage`	`object`	Token counts for billing.

Code examples

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XANTLY_API_KEY"],
    base_url="https://api.xantly.com/v1",
)

response = client.completions.create(
    model="auto",
    prompt="Once upon a time in a land far away,",
    max_tokens=50,
)
print(response.choices[0].text)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.XANTLY_API_KEY,
  baseURL: "https://api.xantly.com/v1",
});

const response = await client.completions.create({
  model: "auto",
  prompt: "Once upon a time in a land far away,",
  max_tokens: 50,
});
console.log(response.choices[0].text);

Errors

HTTP	`error.type`	Typical trigger
`400`	`invalid_request_error`	`stream: true` (not supported), missing model.
`401`	`authentication_error`	Missing or invalid Bearer token.
`429`	`rate_limit_error`	Rate limit exceeded.
`402`	`billing_error`	Token quota or budget exceeded.
`500`	`internal_error`	Provider error or internal failure.

Next steps

Chat Completions — Recommended endpoint for new integrations
Models — List available models