Skip to main content
Most documentation lives outside the product and treats every reader the same. Autonomy lets you change that. By running documentation agents inside your app, you can give users answers that reflect who they are, what they are trying to do, and where they are in the product. Here is an example, the following voice enabled agents are powered by Autonomy. Give it a try, ask it anything about Autonomy, type below or click the microphone to talk: Agents, built with Autonomy, can power text and voice conversations rooted in knowledge from docs stored in Mintlify, Gitbook, or other documentation systems. They can also trigger actions in a product by invoking APIs available to them as tools defined in python. We use this exact pattern on our own website. The chat and voice experience at https://autonomy.computer#learn runs on Autonomy and connects directly to our docs stored in Mintlify.
From docs to in-product agents
This lets us tailor the experience of each visitor. Agents receive custom instructions that adapt responses for different audiences, such as developers, analysts, investors, or first-time visitors.

Make your docs come alive!

In the next few minutes you can launch an app and apis for agents that:
  • Speak answers drawn from your docs.
  • Search your docs using vector embeddings.
  • Respond fast using a two-agent delegation pattern.
  • Reload docs periodically to stay current.
  • Work through voice and text.
  • Trigger your product’s APIs to take actions.
1

Sign up and install the autonomy command.

Complete the steps to get started with Autonomy.
2

Get the example code

curl -sL https://github.com/build-trust/autonomy/archive/refs/heads/main.tar.gz | \
  tar -xz --strip-components=3 autonomy-main/examples/voice/docs
cd docs
This will create a new directory with the complete example:
File Structure:
docs/
|-- autonomy.yaml
|-- images/
    |-- main/
        |-- Dockerfile
        |-- main.py         # Application entry point
        |-- index.html      # Voice and text interface
3

Point to your docs

Open images/main/main.py and update the INDEX_URL to point to your documentation:
images/main/main.py
INDEX_URL = "https://your-docs-site.com/llms.txt"  # Change this
The example expects an llms.txt file containing markdown links to your documentation pages. This format is common with documentation platforms like Mintlify, Gitbook, and others.
4

Deploy

autonomy
Once deployed, open your zone URL in a browser to access the voice and text interface.
When a user speaks to the agent, a voice agent receives audio over a websocket and transcribes it. It speaks a brief acknowledgment (“Great question!”) and delegates the question to a primary agent, which searches a knowledge base for relevant documentation and returns a concise answer. This two-agent pattern ensures low latency while maintaining accuracy through retrieval-augmented generation.

Customize the agents

Update the instructions The agent instructions define how your agent responds. Customize these for your product:
images/main/main.py
INSTRUCTIONS = """
You are a developer advocate for [YOUR PRODUCT].
[YOUR PRODUCT] is a platform that [WHAT IT DOES].

You can access a knowledge base containing the complete [YOUR PRODUCT] docs.
ALWAYS use the search_[your_product]_docs tool to find accurate information
before answering.

IMPORTANT: Keep your responses concise - ideally 2-4 sentences. You are primarily
used through a voice interface, so brevity is essential.

- Always search the knowledge base first.
- If you can't find it, say so. Don't make stuff up.
- Use active voice, strong verbs, and short sentences.
"""
Also update the voice agent instructions and the knowledge tool name to match your product. Tune the Knowledge Base Adjust these parameters based on your documentation size and structure:
images/main/main.py
def create_knowledge():
    return Knowledge(
        name="autonomy_docs",
        searchable=True,
        model=Model("embed-english-v3"),
        max_results=10,
        max_distance=0.4,
        max_knowledge_size=8192,
        chunker=NaiveChunker(max_characters=800, overlap=100),
    )
OptionDescription
max_resultsNumber of relevant chunks to retrieve per query. Increase for broader context.
max_distanceSimilarity threshold (0.0 = exact match, 1.0 = very different). Lower values are stricter.
max_knowledge_sizeMaximum total size of retrieved context in characters.
chunkerStrategy for splitting documents. Smaller chunks improve precision; larger chunks preserve context.
Voice Configuration Options
OptionDescriptionDefault
voiceTTS voice: alloy, echo, nova, shimmerecho
realtime_modelModel for voice agentgpt-4o-realtime-preview
vad_thresholdVoice detection sensitivity (0.0-1.0). Higher = less sensitive.0.5
vad_silence_duration_msSilence before end of speech detection500
delegation_instructionsInstructions passed when delegating to the primary agentNone
Multilingual Support: The voice agent can detect and transcribe speech in multiple languages. If a user speaks in German, for example, the system may transcribe and respond in German automatically. To control this behavior, add language instructions to your agent configuration — either enforcing English responses by default or enabling intentional multilingual support.
Adding Filesystem Tools (Optional) For large documentation sets where semantic search alone may not provide complete context, you can add filesystem tools as a fallback. This gives the agent direct access to read complete documentation files.
images/main/main.py
from autonomy import FilesystemTools, Tool

DOCS_DIR = "/tmp/docs"

# Download docs to filesystem (in addition to knowledge base)
async def download_docs():
  # ... download and save files to DOCS_DIR ...

# Add filesystem tools
fs_tools = FilesystemTools(visibility="all", base_dir=DOCS_DIR)

await Agent.start(
  node=node,
  name="docs",
  tools=[
    knowledge_tool,
    Tool(fs_tools.read_file),
    Tool(fs_tools.list_directory),
    Tool(fs_tools.search_in_files),
  ],
  # ... rest of config
)
Update the instructions to guide the agent on when to use each tool:
images/main/main.py
INSTRUCTIONS = """
...

You have access to two types of tools:

1. **search_docs** - Semantic search (use first)
2. **Filesystem tools** - Direct file access (use when semantic search is insufficient)
   - read_file(path) - Read complete files
   - search_in_files(pattern, path) - Find exact text patterns

ALWAYS start with semantic search. Only use filesystem tools when you need
complete context, exact code examples, or specific configuration values.
"""
See Filesystem access for more on filesystem tools and visibility options.

Learn how it works

Loading Documentation The example downloads documentation from URLs and loads them into the knowledge base:
images/main/main.py
async def load_docs(knowledge: Knowledge):
    print(f"[DOCS] Starting download from {INDEX_URL}")

    try:
        async with httpx.AsyncClient(timeout=30.0) as client:
            response = await client.get(INDEX_URL)
            llms_txt = response.text
        print(f"[DOCS] Fetched index ({len(llms_txt)} chars)")
    except Exception as e:
        print(f"[DOCS] ERROR fetching index: {e}")
        raise

    links = re.findall(r"\[([^\]]+)\]\((https://[^\)]+\.md)\)", llms_txt)

    count = 0
    print(f"[DOCS] Found {len(links)} doc links to download")

    for title, url in links:
        try:
            await knowledge.add_document(
                document_name=title,
                document_url=url,
                content_type="text/markdown",
            )
            count += 1
        except Exception as e:
            print(f"[DOCS] ERROR loading '{title}': {e}")

    print(f"[DOCS] Successfully loaded {count} documents into knowledge base")
    return count
The add_document method fetches the content from the URL and indexes it for semantic search. Voice Agent Delegation The voice agent uses a two-step pattern for low-latency responses:
images/main/main.py
VOICE_INSTRUCTIONS = """
You are a developer advocate for Autonomy.
Autonomy is a platform that developers use to ship autonomous products.

# Critical Rules

- Before delegating, speak a SHORT lead-in (1-4 words max) that acknowledges the user.
  - Good examples: "Great question!", "Good question!", "Glad you asked ...",
    "Right, great question. So ...", "Here's the core idea ...",
    "Sure, here's the core idea ...", "Right, so ...", "Okay, so ..."
  - NEVER say: "Let me...",  "Let me get/explain/break that down"
  - NEVER use phrases that imply fetching, thinking, or processing
- Delegate immediately after the brief acknowledgment.
- NEVER answer questions about Autonomy from your own knowledge - always delegate.

# After Receiving Response
Read the primary agent's response VERBATIM and IN FULL.
Do NOT truncate, summarize, or modify it in any way.

# Personality
- Be friendly but minimal - get to the point fast
- Be direct and confident
- Short acknowledgments, then let the content speak
"""
This ensures users hear immediate feedback while the primary agent retrieves accurate information. Starting the Agent The agent is configured with voice capabilities and the knowledge tool:
images/main/main.py
async def main(node: Node):
    global knowledge_tool

    knowledge = create_knowledge()
    knowledge_tool = KnowledgeTool(knowledge=knowledge, name="search_autonomy_docs")

    await Agent.start(
        node=node,
        name="docs",
        instructions=INSTRUCTIONS,
        model=Model("claude-sonnet-4-5", max_tokens=256),
        tools=[knowledge_tool],
        context_summary={
            "floor": 20,
            "ceiling": 30,
            "model": Model("claude-sonnet-4-5"),
        },
        voice={
            "voice": "shimmer",
            "instructions": VOICE_INSTRUCTIONS,
            "vad_threshold": 0.7,
            "vad_silence_duration_ms": 700,
            "delegation_instructions": DELEGATION_INSTRUCTIONS,
        },
    )

    await load_docs(knowledge)
    asyncio.create_task(refresh_periodically())


Node.start(main, http_server=HttpServer(app=app))
Auto-Refresh The example periodically reloads documentation to stay current:
images/main/main.py
REFRESH_INTERVAL_SECONDS = 1800  # 30 minutes

async def refresh_periodically():
    while True:
        await asyncio.sleep(REFRESH_INTERVAL_SECONDS)
        try:
            await refresh_knowledge()
        except Exception:
            pass
You can also trigger a manual refresh via the HTTP endpoint:
curl -X POST https://your-zone.cluster.autonomy.computer/refresh

Add it to your product

Autonomy automatically provides APIs and streaming infrastructure for every agent you create. This makes is simple to integrate the agents that you created above into your product. HTTP API — Every agent gets HTTP endpoints out of the box:
# Single response
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"message": "How do I get started?"}' \
  "https://your-zone.cluster.autonomy.computer/agents/docs"

# Streaming response
curl -X POST \
  -H "Content-Type: application/json" \
  -d '{"message": "What features are available?"}' \
  "https://your-zone.cluster.autonomy.computer/agents/docs?stream=true"
WebSocket API — Voice agents get WebSocket endpoints for real-time audio streaming:
/dev/null/example.js
const protocol = window.location.protocol === "https:" ? "wss:" : "ws:";
const wsUrl = `${protocol}//${window.location.host}/agents/docs/voice`;

const ws = new WebSocket(wsUrl);
ws.onopen = () => ws.send(JSON.stringify({ type: "config" }));
ws.onmessage = (event) => handleAudioOrTranscript(JSON.parse(event.data));
Multitenancy — You can isolate conversations per user with scope and conversation parameters. Each combination gets its own Context and Memory:
/dev/null/example.js
// WebSocket with isolation
const wsUrl = `/agents/docs/voice?scope=${userId}&conversation=${sessionId}`;

// HTTP API with isolation
fetch(`/agents/docs?stream=true`, {
  method: "POST",
  body: JSON.stringify({
    message: userMessage,
    scope: userId,
    conversation: sessionId
  })
});
The example includes a complete voice and text UI in index.html that you can adapt for your product. Here are the key parts: WebSocket Connection Connect to the voice agent with multi-tenant isolation:
images/main/index.html
const id = () => Math.random().toString(36).slice(2);
const visitorId = id();
let conversationId = id();

async function connect() {
  const protocol = window.location.protocol === "https:" ? "wss:" : "ws:";
  const wsUrl = `${protocol}//${window.location.host}/agents/docs/voice?scope=${visitorId}&conversation=${conversationId}`;

  ws = new WebSocket(wsUrl);

  ws.onopen = () => {
    isConnected = true;
    ws.send(JSON.stringify({ type: "config" }));
  };

  ws.onmessage = async (event) => {
    const data = JSON.parse(event.data);
    await handleServerMessage(data);
  };
}
  • scope - Isolates memory per user. Each visitor gets their own conversation history.
  • conversation - Isolates memory per session. A user can have multiple separate conversations.
Audio Capture Capture microphone input and send as PCM16:
images/main/index.html
mediaStream = await navigator.mediaDevices.getUserMedia({
  audio: {
    channelCount: 1,
    sampleRate: 24000,
    echoCancellation: true,
    noiseSuppression: true,
    autoGainControl: true,
  },
});

audioContext = new (window.AudioContext || window.webkitAudioContext)({
  sampleRate: 24000,
});

// AudioWorklet processes audio and sends to server
workletNode.port.onmessage = (e) => {
  const { pcm16, maxLevel } = e.data;

  // Client-side interruption detection with echo cancellation grace period
  const timeSinceAudioStart = Date.now() - lastAudioPlayTime;
  if (isSpeaking && scheduledSources.length > 0 && maxLevel > 0.25 && timeSinceAudioStart > 1000) {
    clearAudioQueue();
    isSpeaking = false;
  }

  const audioBase64 = btoa(String.fromCharCode(...new Uint8Array(pcm16)));
  ws.send(JSON.stringify({ type: "audio", audio: audioBase64 }));
};
Audio Playback Play streamed audio responses with proper scheduling:
images/main/index.html
async function playAudioChunk(base64Audio) {
  if (!playbackAudioContext) {
    playbackAudioContext = new (window.AudioContext || window.webkitAudioContext)({
      sampleRate: 24000,
    });
    nextPlayTime = playbackAudioContext.currentTime + 0.05;
  }

  // Track when audio starts for echo cancellation
  if (scheduledSources.length === 0) {
    lastAudioPlayTime = Date.now();
  }

  const audioBytes = Uint8Array.from(atob(base64Audio), (c) => c.charCodeAt(0));
  const pcm16 = new Int16Array(audioBytes.buffer);
  const float32 = new Float32Array(pcm16.length);

  for (let i = 0; i < pcm16.length; i++) {
    float32[i] = pcm16[i] / 32768.0;
  }

  const audioBuffer = playbackAudioContext.createBuffer(1, float32.length, 24000);
  audioBuffer.getChannelData(0).set(float32);

  const source = playbackAudioContext.createBufferSource();
  source.buffer = audioBuffer;
  source.connect(playbackAudioContext.destination);

  source.onended = () => {
    scheduledSources = scheduledSources.filter((s) => s !== source);
  };
  scheduledSources.push(source);

  // Catch up if fallen behind
  const currentTime = playbackAudioContext.currentTime;
  if (nextPlayTime < currentTime) {
    nextPlayTime = currentTime + 0.03;
  }

  source.start(nextPlayTime);
  nextPlayTime += audioBuffer.duration;
}
Handle Server Events Process different message types from the voice agent:
images/main/index.html
async function handleServerMessage(data) {
  switch (data.type) {
    case "audio":
      isSpeaking = true;
      await playAudioChunk(data.audio);
      setCircleState("speaking");
      break;
    case "transcript":
      addTranscript(data.role, data.text);
      break;
    case "speech_started":
      if (scheduledSources.length > 0) clearAudioQueue();
      isSpeaking = false;
      setCircleState("listening");
      break;
    case "speech_stopped":
      setCircleState("processing");
      break;
    case "response_complete":
      isSpeaking = false;
      setCircleState("listening");
      break;
  }
}
Text Chat with Streaming The example also includes text chat using the streaming HTTP API:
images/main/index.html
const response = await fetch(`/agents/docs?stream=true`, {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    message: message,
    scope: visitorId,
    conversation: conversationId,
  }),
});

const reader = response.body.getReader();
const decoder = new TextDecoder();
let buffer = "";

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  buffer += decoder.decode(value, { stream: true });
  const lines = buffer.split("\n");
  buffer = lines.pop() || "";

  for (const line of lines) {
    if (!line.trim()) continue;
    try {
      const data = JSON.parse(line);
      const text = getAssistantText(data);
      if (text) pendingText += text;
    } catch (e) {}
  }
}

Learn more Troubleshoot
  • Check that max_distance isn’t too strict (try 0.4 or higher).
  • Verify documents loaded successfully by checking the /refresh endpoint response.
  • Ensure the embedding model matches your content language.
  • Ensure your browser has microphone permissions.
  • Use Chrome or Edge for best WebSocket and Web Audio API support.
  • Check the browser console for WebSocket connection errors.
  • Adjust the instructions to emphasize using the search tool.
  • Increase max_results to provide more context.
  • Lower max_distance to retrieve more relevant chunks.
  • Consider adding filesystem tools for complete document access.
  • Reduce max_tokens in the model configuration.
  • Use a faster model for the primary agent.
  • Ensure your knowledge base isn’t too large.
  • Consider using size: big in your pod configuration for better performance.