Quickstart

Go from API key to working request in under 60 seconds.

Prerequisites

An active Xantly API key (sk-...). Get your API key →
curl, Python 3.8+, or Node.js 18+

Step 1 — Set your API key

Store your API key in an environment variable so it stays out of source code.

export XANTLY_API_KEY="sk-your-key-here"

Step 2 — Send your first request

A minimal chat completion. Use model: "auto" and let the gateway pick the best model for you.

import os
from openai import OpenAI

client = OpenAI(
    api_key=os.environ["XANTLY_API_KEY"],
    base_url="https://api.xantly.com/v1",
)

response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": "What is a vector database? One sentence."}],
)
print(response.choices[0].message.content)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: process.env.XANTLY_API_KEY,
  baseURL: "https://api.xantly.com/v1",
});

const response = await client.chat.completions.create({
  model: "auto",
  messages: [{ role: "user", content: "What is a vector database? One sentence." }],
});
console.log(response.choices[0].message.content);

curl -sS https://api.xantly.com/v1/chat/completions \
  -H "Authorization: Bearer $XANTLY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "messages": [
      {"role": "user", "content": "What is a vector database? One sentence."}
    ]
  }'

Step 3 — Read the response

The response is 100% OpenAI-compatible. Any code that parses OpenAI responses works unchanged.

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "model": "deepseek-chat",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A vector database stores high-dimensional embeddings and retrieves them via similarity search."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 22,
    "total_tokens": 40
  }
}

Step 4 — Inspect routing metadata

Xantly adds x-xantly-* headers for routing transparency. Use them to inspect how your request was handled.

curl -i -sS https://api.xantly.com/v1/chat/completions \
  -H "Authorization: Bearer $XANTLY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"model":"auto","messages":[{"role":"user","content":"Hello"}]}' \
  2>&1 | grep -i "x-xantly-"

Example output:

x-xantly-tier-used: T2
x-xantly-lane-used: smart
x-xantly-provider: deepseek
x-xantly-cache-hit: false
x-xantly-cost-usd: 0.00032

x-xantly-cost-usd is present when usage/cost can be computed.

Enable streaming

Add stream: true to receive Server-Sent Events (chat.completion.chunk).

Current behavior note:

Non-voice streaming is SSE-compatible but may return a compact sequence (role chunk + content chunk + [DONE]) rather than token-by-token chunks.
stream_options.include_usage is forwarded where supported, but a terminal usage chunk is not guaranteed.

stream = client.chat.completions.create(
    model="auto",
    stream=True,
    stream_options={"include_usage": True},
    messages=[{"role": "user", "content": "Write a haiku about APIs."}],
)
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

const stream = await client.chat.completions.create({
  model: "auto",
  stream: true,
  stream_options: { include_usage: true },
  messages: [{ role: "user", content: "Write a haiku about APIs." }],
});
for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}

curl -N https://api.xantly.com/v1/chat/completions \
  -H "Authorization: Bearer $XANTLY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "auto",
    "stream": true,
    "messages": [{"role": "user", "content": "Write a haiku about APIs."}]
  }'

Migrating from OpenAI

If you already use the OpenAI SDK, migration is a single-line change:

  client = OpenAI(
-     api_key=os.environ["OPENAI_API_KEY"],
+     api_key=os.environ["XANTLY_API_KEY"],
+     base_url="https://api.xantly.com/v1",
  )

All standard parameters (model, messages, temperature, max_tokens, tools, response_format) work identically.

Voice quickstart

Voice endpoints use the same API key and show up on the same invoice as chat. There is no separate subscription — if you have an API key, you can call voice.

Transcribe audio (STT)

curl -X POST https://api.xantly.com/v1/voice/transcribe \
  -H "Authorization: Bearer $XANTLY_API_KEY" \
  -F "[email protected]" \
  -F "language=en" \
  -F "stt_model=groq/whisper-large-v3-turbo"

Synthesize speech (TTS)

curl -X POST https://api.xantly.com/v1/voice/synthesize \
  -H "Authorization: Bearer $XANTLY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello from Xantly voice.",
    "voice": "alloy",
    "model": "openai/gpt-4o-mini-tts",
    "output_format": "pcm_16000"
  }' \
  --output response.pcm

Full voice pipeline (audio → audio)

curl -X POST https://api.xantly.com/v1/voice/chat \
  -H "Authorization: Bearer $XANTLY_API_KEY" \
  -F "[email protected]" \
  -F "stt_model=groq/whisper-large-v3-turbo" \
  -F "tts_model=elevenlabs/eleven_flash_v2_5" \
  --output reply.pcm

Or the header shortcut (OpenAI SDK compatible)

client = openai.OpenAI(api_key=os.environ["XANTLY_API_KEY"], base_url="https://api.xantly.com/v1")
response = client.chat.completions.create(
    model="auto",
    messages=[{"role": "user", "content": transcript_text}],
    extra_headers={"x-xantly-voice": "true"},
)

Free tier includes a one-time 3-minute voice demo. See the Voice Models Catalog for all 30+ models and the Voice Billing reference for pricing details.

What's next?

Chat Completions Reference — Full endpoint reference with all parameters
Authentication — Secure your API key integration
Cost-Optimized Routing — Tune cost, latency, and quality tradeoffs
Multi-Agent Orchestration — Build agentic workflows

On this page