XantlyANTLY
API Reference

Embeddings

Create vector embeddings for text. Embeddings are dense numerical representations of text useful for semantic search, clustering, classification, and retrieval-augmented generation (RAG).

Create vector embeddings for text. Embeddings are dense numerical representations of text useful for semantic search, clustering, classification, and retrieval-augmented generation (RAG).

  • POST /v1/embeddings
  • Auth: Authorization: Bearer <token>
  • Drop-in compatible with the OpenAI Embeddings API — same request/response shape.

Quick start

curl -sS https://api.xantly.com/v1/embeddings \
  -H "Authorization: Bearer $XANTLY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog."
  }'

Request body

FieldTypeRequiredDescription
modelstringYesEmbedding model slug (e.g. text-embedding-3-small, text-embedding-ada-002). Use GET /v1/models to list available embedding models.
inputstring | array<string>YesText to embed. Pass a string for a single input, or an array of strings for batch embedding.

Response body

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023064255, -0.009327292, 0.015797347, "..."]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 9,
    "total_tokens": 9
  }
}
FieldTypeDescription
objectstringAlways "list".
dataarrayOne embedding object per input string.
data[].objectstringAlways "embedding".
data[].indexintegerPosition in the input array (0-indexed).
data[].embeddingarray<float>Dense vector representation. Dimensionality depends on the model (e.g. 1536 for text-embedding-3-small).
modelstringModel that generated the embeddings.
usage.prompt_tokensintegerTokens consumed by the input.
usage.total_tokensintegerSame as prompt_tokens for embeddings.

Code examples

Single input

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XANTLY_API_KEY"],
    base_url="https://api.xantly.com/v1",
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Semantic search is powerful for RAG applications.",
)
vector = response.data[0].embedding
print(f"Dimensions: {len(vector)}")
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.XANTLY_API_KEY,
  baseURL: "https://api.xantly.com/v1",
});

const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: "Semantic search is powerful for RAG applications.",
});
const vector = response.data[0].embedding;
console.log(`Dimensions: ${vector.length}`);

Batch input

texts = [
    "What is a vector database?",
    "How does semantic search work?",
    "Explain cosine similarity.",
]

response = client.embeddings.create(
    model="text-embedding-3-small",
    input=texts,
)

for item in response.data:
    print(f"[{item.index}] {len(item.embedding)}-dim vector")
const texts = [
  "What is a vector database?",
  "How does semantic search work?",
  "Explain cosine similarity.",
];

const response = await client.embeddings.create({
  model: "text-embedding-3-small",
  input: texts,
});

response.data.forEach(({ index, embedding }) => {
  console.log(`[${index}] ${embedding.length}-dim vector`);
});

Automatic caching

Xantly automatically caches embedding responses using an exact-match in-process cache. Identical requests (same model + same input text) return instantly from cache with zero provider cost.

  • Cache capacity: 5,000 entries
  • TTL: 5 minutes
  • Key: BLAKE3 hash of model:input
  • Scope: Per-instance (not shared across pods)

This is transparent — no configuration needed. Repeated calls to the same text within 5 minutes are free and sub-millisecond.


Errors

HTTPerror.typeerror.codeTypical trigger
400invalid_request_errorvalidation_errorMissing model or input, or empty input.
401authentication_errorinvalid_api_keyMissing or invalid Bearer token.
429rate_limit_errorrate_limit_exceededRate limit exceeded — see Rate Limits.
402billing_errorbudget_exceededMonthly token quota or budget exceeded — see Billing & Quotas.
500internal_errorinternal_errorNo embedding provider available or provider error.

Next steps

  • Models — List all available models including embedding models
  • Rate Limits — Understand limits that apply to embedding calls
  • Chat Completions — Main gateway endpoint for LLM inference

On this page