XantlyANTLY
API Reference

Voice Billing

Voice requests use the same API key, the same budget pool, and the same monthly invoice as chat. What differs is the pricing unit (minutes and characters, not tokens) and the addition of a per-minute


title: Voice Billing description: How voice requests are priced, quota-enforced, and surfaced on invoices.

Voice requests use the same API key, the same budget pool, and the same monthly invoice as chat. What differs is the pricing unit (minutes and characters, not tokens) and the addition of a per-minute platform fee.

This page documents how Xantly bills voice, enforces quotas, and surfaces costs back to you.


Cost breakdown

A single voice turn is billed as the sum of up to four components:

ComponentPricing modelWhere the cost comes from
Speech-to-Text (STT)Per minute of input audioProvider passthrough (Whisper, Deepgram Nova, Groq Whisper, etc.)
LLM inferencePer token (same as chat)The model slug dispatched inside the voice pipeline
Text-to-Speech (TTS)Per 1M charactersProvider passthrough (OpenAI TTS, ElevenLabs, Deepgram Aura, etc.)
Platform feePer minute of audioXantly's orchestration, caching, and routing layer

The first three are pure provider passthrough — Xantly bills exactly what the underlying provider charged. The platform fee is where Xantly earns revenue and is the only tunable margin lever.

Chat models called inside the voice pipeline (e.g. groq/llama-3.3-70b selected by BaRP for low-latency fast-lane turns) are priced per token like any other chat completion.


Plan quotas

PlanMonthly minutesConcurrent sessionsVoice RPMPlatform fee/min
Free3 min lifetime (one-time demo)13$0.00
Pro500 / month560$0.02
Scale5,000 / month25500$0.015
Pay-As-You-GoUnlimited (credit-bounded)530$0.025

Per-org overrides are available via org_settings:

  • voice_monthly_minutes_limit — override the monthly cap
  • voice_concurrent_session_limit — override the concurrency limit
  • voice_rpm_limit — override the requests-per-minute limit
  • voice_platform_fee_per_min — override the platform fee (for custom enterprise deals)
  • voice_markup_pct_override — uniform markup override on provider pass-through costs
  • voice_component_overrides — per-component (stt / tts / inference) markup override

Free tier minutes are lifetime, not monthly. Once used, they do not reset. Upgrade to Pro to get a monthly allocation.


Quota enforcement order

For every voice request, Xantly runs these checks in order. The first one that fails returns an error and the request does not count toward any other quota:

  1. Monthly budget cap — the same budget:usage:{org_id}:general:{YYYY-MM} pool as chat. Returns 402 Payment Required when exceeded.
  2. Monthly voice minutes quotaplan_voice_minutes_monthly or the org override. Returns 429 Too Many Requests ("Voice audio minutes quota exceeded").
  3. Free tier lifetime minutes — only applies to the Free plan. Returns 429 ("Free voice demo limit reached").
  4. PAYG credit floor — PAYG accounts must have at least $0.05 credit balance to start a voice request. Returns 402 Payment Required.
  5. Concurrent session limitplan_voice_concurrent_sessions or the org override. Returns 429 ("Voice concurrent session limit reached").
  6. Voice RPM (sliding window) — enforced by the rate limit middleware.

When a voice request fails mid-pipeline (e.g. STT succeeds, TTS fails), Xantly bills for the stages that completed via the partial cost accumulator. You are never charged for work the provider never did, but you are also never refunded for work that was successfully billed upstream.


Cost visibility headers

Every successful voice response includes detailed cost + routing metadata headers:

HeaderExample valueMeaning
X-Xantly-Cost-USD0.00324Total customer charge for this request
X-Xantly-STT-ProviderdeepgramWhich STT provider actually ran
X-Xantly-STT-Modeldeepgram/nova-2Which STT model was dispatched
X-Xantly-TTS-ProviderelevenlabsWhich TTS provider actually ran
X-Xantly-TTS-Modelelevenlabs/eleven_flash_v2_5Which TTS model was dispatched
X-Xantly-STT-Latency-Ms82.4STT stage latency
X-Xantly-Inference-Latency-Ms147.2LLM inference latency
X-Xantly-TTS-Latency-Ms54.1TTS stage latency
X-Xantly-Model-Usedgroq/llama-3.3-70bThe chat model that served inference inside the pipeline
X-Xantly-Lane-UsedFastLaneWhether BaRP routed through the fast lane or delegation lane
X-Xantly-Cache-Hittruetrue when the voice semantic cache served the response (inference cost = $0)

Anomaly thresholds

Xantly runs automatic cost-anomaly detection on every voice request. If a request exceeds either of the thresholds below, a warning is logged to Mission Control for operator review. The thresholds are configurable at deploy time:

Environment variableDefaultMeaning
VOICE_ANOMALY_COST_PER_MIN_THRESHOLD0.66Maximum provider cost per minute of audio before firing an alert (3x premium stack expected max)
VOICE_ANOMALY_SINGLE_REQUEST_THRESHOLD5.0Maximum provider cost for a single voice request before firing an alert

A third sanity check — "STT completed in <10ms for >5 seconds of audio" — is always enabled and not tunable. It catches broken duration tracking in STT provider responses.


Stripe integration

Voice minutes are reported to Stripe Metered Billing on a 60-second interval loop from the xantly-api process. Each org with stripe_voice_sub_item_id set on org_settings gets its total voice minutes for the current calendar month reported with action=set (idempotent).

To wire up voice metered billing for production:

  1. Create a Stripe Product called "Xantly Voice Minutes" in your Stripe Dashboard.
  2. Create a Metered Price with unit = "minute" and the currency / usage aggregation of your choice.
  3. Set STRIPE_VOICE_METERED_PRICE_ID=price_... in the xantly-api environment.
  4. When a customer subscribes to Pro or Scale, Xantly automatically adds the voice metered price as a second subscription item and persists the resulting subscription_item_id on org_settings.stripe_voice_sub_item_id.
  5. The 60-second sync loop picks up from there.

When the env var is unset, all voice metering code paths are no-ops and voice usage is tracked only in gateway_requests (still visible on internal invoices, just not pushed to Stripe).


Next steps

On this page