Guide: Multi-Agent Orchestration

Build multi-step workflows with tool use, memory, and reliability controls on top of /v1/chat/completions.

Build multi-step workflows with tool use, memory, and reliability controls on top of /v1/chat/completions.

Everything in this guide is optional. If you omit orchestration fields, requests run in default auto mode.

1) How orchestration is activated

The gateway classifies each request from your payload (messages, tools, response format, metadata). You can let this be automatic or provide explicit hints.

Auto mode (recommended first)

{
  "model": "auto",
  "messages": [{"role": "user", "content": "Summarize this report."}]
}

Explicit workflow hint (`xantly.workflow_type`)

Accepted values:

single_turn
execution_task
multi_step_conversational
long_horizon_autonomous
voice_simple
voice_complex
creative

"xantly": {
  "workflow_type": "long_horizon_autonomous"
}

If an unrecognized value is supplied, the gateway falls back to automatic classification.

2) Chain controls

Continue a chain

Use xantly.chain_id (UUID recommended) to signal continuation:

"xantly": {
  "chain_id": "7f2c8d45-3d7f-4b4b-8d0f-a3e84fd8d6b2"
}

Limit chain depth and runtime

"xantly": {
  "max_chain_steps": 12,
  "chain_timeout_secs": 180
}

Important: these limits are applied only when the resolved workflow is long_horizon_autonomous.

Sticky vs mixed chain routing

Set this under routing_hints:

"routing_hints": {
  "chain_routing": "sticky"
}

sticky: preserve continuation consistency.
mixed: allow per-step re-routing (disables sticky continuation behavior).

3) Planning mode

You can steer planning style with xantly.planning_mode:

"xantly": {
  "planning_mode": "planact"
}

Accepted values:

preact
planact

If omitted, tenant defaults and heuristics are used.

4) Memory, cache, and conversation continuity

Persistent conversation scope

"xantly": {
  "conversation_id": "acct-42-onboarding",
  "enable_memory": true,
  "enable_cache": true
}

conversation_id scopes continuity.
enable_memory controls L1/L2 persistence for this request path.
enable_cache controls cache eligibility for this request.

Defaults:

enable_memory: true
enable_cache: true

5) Reliability and output verification

Reliability level

"xantly": {
  "reliability_level": "high"
}

Supported values:

standard
high
critical

Verification strategy override

"xantly": {
  "output_verification": "schema"
}

Supported values:

none
native
schema
cross_model

If omitted, strategy is auto-selected from request and tenant settings.

6) Voice workflows

Voice behavior can be signaled with xantly.voice_mode (commonly "true" for voice path handling):

"xantly": {
  "voice_mode": "true"
}

For streaming voice requests, responses are returned as SSE chat.completion.chunk events.

7) End-to-end examples

Autonomous multi-step workflow with guardrails

{
  "model": "auto",
  "messages": [
    {"role": "system", "content": "You are an ops agent. Execute safely and report results."},
    {"role": "user", "content": "Create a remediation plan for recurring API timeouts and include milestones."}
  ],
  "routing_hints": {
    "mode": "balanced",
    "chain_routing": "mixed",
    "task_complexity": "complex"
  },
  "xantly": {
    "workflow_type": "long_horizon_autonomous",
    "max_chain_steps": 8,
    "chain_timeout_secs": 120,
    "reliability_level": "high",
    "output_verification": "cross_model",
    "conversation_id": "incident-2026-03-09",
    "enable_memory": true,
    "enable_cache": true
  }
}

Structured tool workflow (execution task)

{
  "model": "auto",
  "messages": [
    {"role": "user", "content": "Extract the top 5 action items from this incident transcript."}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "fetch_incident_transcript",
        "parameters": {
          "type": "object",
          "properties": {"incident_id": {"type": "string"}},
          "required": ["incident_id"]
        }
      }
    }
  ],
  "response_format": {"type": "json_object"},
  "xantly": {
    "workflow_type": "execution_task",
    "output_verification": "schema"
  }
}

8) Edge cases to plan for

xantly.max_chain_steps and xantly.chain_timeout_secs are conditional (long-horizon only).
xantly.chain_id should be UUID-formatted; invalid values are ignored for chain classification.
Some orchestration fields are currently accepted but reserved (cache_ttl_secs, compress_context, enable_tool_reranking, xantly.chain_routing).
For chain routing behavior today, prefer routing_hints.chain_routing over xantly.chain_routing.