Embeddings

Create vector embeddings for text. Embeddings are dense numerical representations of text useful for semantic search, clustering, classification, and retrieval-augmented generation (RAG).

POST /v1/embeddings
Auth: Authorization: Bearer <token>
Drop-in compatible with the OpenAI Embeddings API — same request/response shape.

Quick start

curl -sS https://api.xantly.com/v1/embeddings \
  -H "Authorization: Bearer $XANTLY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog."
  }'

Request body

Field	Type	Required	Description
`model`	`string`	Yes	Embedding model slug (e.g. `text-embedding-3-small`, `text-embedding-ada-002`). Use `GET /v1/models` to list available embedding models.
`input`	`string \| array<string>`	Yes	Text to embed. Pass a string for a single input, or an array of strings for batch embedding.

Response body

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023064255, -0.009327292, 0.015797347, "..."]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}

Field	Type	Description
`object`	`string`	Always `"list"`.
`data`	`array`	One embedding object per input string.
`data[].object`	`string`	Always `"embedding"`.
`data[].index`	`integer`	Position in the input array (0-indexed).
`data[].embedding`	`array<float>`	Dense vector representation. Dimensionality depends on the model (e.g. 1536 for `text-embedding-3-small`).
`model`	`string`	Model that generated the embeddings.
`usage.prompt_tokens`	`integer`	Tokens consumed by the input.
`usage.total_tokens`	`integer`	Same as `prompt_tokens` for embeddings.

Code examples

Single input

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XANTLY_API_KEY"],
    base_url="https://api.xantly.com/v1",
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Semantic search is powerful for RAG applications.",
)
vector = response.data[0].embedding
print(f"Dimensions: {len(vector)}")

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.XANTLY_API_KEY,
  baseURL: "https://api.xantly.com/v1",
});

const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: "Semantic search is powerful for RAG applications.",
});
const vector = response.data[0].embedding;
console.log(`Dimensions: ${vector.length}`);

Batch input

texts = [
    "What is a vector database?",
    "How does semantic search work?",
    "Explain cosine similarity.",
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts,
)

for item in response.data:
    print(f"[{item.index}] {len(item.embedding)}-dim vector")

const texts = [
  "What is a vector database?",
  "How does semantic search work?",
  "Explain cosine similarity.",
];

const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: texts,
});

response.data.forEach(({ index, embedding }) => {
  console.log(`[${index}] ${embedding.length}-dim vector`);
});

Automatic caching

Xantly automatically caches embedding responses using an exact-match in-process cache. Identical requests (same model + same input text) return instantly from cache with zero provider cost.

Cache capacity: 5,000 entries
TTL: 5 minutes
Key: BLAKE3 hash of model:input
Scope: Per-instance (not shared across pods)

This is transparent — no configuration needed. Repeated calls to the same text within 5 minutes are free and sub-millisecond.

Errors

HTTP	`error.type`	`error.code`	Typical trigger
`400`	`invalid_request_error`	`validation_error`	Missing `model` or `input`, or empty input.
`401`	`authentication_error`	`invalid_api_key`	Missing or invalid Bearer token.
`429`	`rate_limit_error`	`rate_limit_exceeded`	Rate limit exceeded — see Rate Limits.
`402`	`billing_error`	`budget_exceeded`	Monthly token quota or budget exceeded — see Billing & Quotas.
`500`	`internal_error`	`internal_error`	No embedding provider available or provider error.

Next steps

Models — List all available models including embedding models
Rate Limits — Understand limits that apply to embedding calls
Chat Completions — Main gateway endpoint for LLM inference

On this page