Skip to main content
Large language models give agents the ability to make autonomous decisions. The Autonomy Computer provides a Model Gateway that gives your apps access to a wide range of models from different providers, each optimized for different use cases and cost profiles. You specify which model an agent should use by passing the model argument to Agent.start():
images/main/main.py
from autonomy import Agent, Model, Node


async def main(node):
  await Agent.start(
    node=node,
    name="henry",
    instructions="You are Henry, an expert legal assistant",
    model=Model("claude-sonnet-4")
  )


Node.start(main)
You can easily switch models by changing the model name in your Model() constructor.

Supported Models

Autonomy supports a curated set of models from leading providers. All models are accessed through the Model Gateway, which handles routing, load balancing, and failover automatically.

Chat Models

Models for conversational AI and text generation, sorted by price from highest to lowest.
ModelProviderInput (per 1M tokens)Output (per 1M tokens)
claude-opus-4, claude-opus-4-v1Anthropic$15.00$75.00
o1OpenAI$15.00$60.00
claude-opus-4-5, claude-opus-4-5-v1Anthropic$5.00$25.00
claude-sonnet-4-5Anthropic$3.00$15.00
claude-sonnet-4, claude-sonnet-4-v1Anthropic$3.00$15.00
nova-premier, nova-premier-v1Amazon$2.50$12.50
gpt-4oOpenAI$2.50$10.00
gpt-4.1OpenAI$2.00$8.00
o3OpenAI$2.00$8.00
gpt-5.1OpenAI$1.25$10.00
gpt-5OpenAI$1.25$10.00
o3-miniOpenAI$1.10$4.40
o4-miniOpenAI$1.10$4.40
claude-haiku-4-5Anthropic$1.00$5.00
nova-pro, nova-pro-v1Amazon$0.80$3.20
gpt-4.1-miniOpenAI$0.40$1.60
gpt-5-miniOpenAI$0.25$2.00
gpt-4o-miniOpenAI$0.15$0.60
gpt-4.1-nanoOpenAI$0.10$0.40
nova-lite, nova-lite-v1Amazon$0.06$0.24
gpt-5-nanoOpenAI$0.05$0.40
nova-micro, nova-micro-v1Amazon$0.035$0.14

Embedding Models

For text embeddings and semantic search.
ModelProviderInput (per 1M tokens)
titan-embedAmazon$0.20
text-embedding-3-largeOpenAI$0.13
embed-english, embed-english-v3Cohere$0.10
embed-multilingual, embed-multilingual-v3Cohere$0.10
text-embedding-3-smallOpenAI$0.02

Realtime Models

For real-time voice conversations.
ModelProviderText Input (per 1M)Text Output (per 1M)Audio Input (per 1M)Audio Output (per 1M)
gpt-4o-realtime-previewOpenAI$5.00$20.00$40.00$80.00
gpt-realtimeOpenAI$4.00$16.00$32.00$64.00
gpt-realtime-miniOpenAI$0.60$2.40$10.00$20.00

Audio Models

For text-to-speech and speech-to-text.
ModelProviderTypePrice
tts-1-hdOpenAIText-to-Speech$30.00 / 1M characters
tts-1OpenAIText-to-Speech$15.00 / 1M characters
whisper-1OpenAISpeech-to-Text$0.006 / minute

Parameters

The Model() constructor accepts additional parameters that control the model’s behavior.
  • temperature: The sampling temperature to use, between 0 and 2. Higher values like 0.8 produce more random outputs, while lower values like 0.2 make outputs more focused and deterministic.
  • top_p: An alternative to sampling with temperature. It instructs the model to consider the results of the tokens with top_p probability. For example, 0.1 means only the tokens comprising the top 10% probability mass are considered.
images/main/main.py
from autonomy import Agent, Model, Node


async def main(node):
  # Use a lower temperature for more focused responses
  await Agent.start(
    node=node,
    name="analyst",
    instructions="You are a financial analyst",
    model=Model("claude-sonnet-4", temperature=0.2)
  )


Node.start(main)

Invoke Models Directly

For simple use cases, you can also invoke models directly. This is useful when you need to make one-off completions without the full features of an agent.
images/main/main.py
from autonomy import Model, Node, SystemMessage, UserMessage


async def main(node):
  model = Model("claude-sonnet-4")

  response = model.complete_chat([
    SystemMessage("You are a helpful assistant."),
    UserMessage("Explain gravity in simple terms")
  ])
  
  print(response)


Node.start(main)

Streaming Responses

You can also stream responses from models by setting stream=True:
images/main/main.py
from autonomy import Model, Node, SystemMessage, UserMessage


async def main(node):
  model = Model("claude-sonnet-4")

  streaming_response = model.complete_chat([
      SystemMessage("You are a helpful assistant."),
      UserMessage("Explain gravity")
  ], stream=True)

  async for chunk in streaming_response:
    if hasattr(chunk, "choices") and chunk.choices and chunk.choices[0].delta.content:
      content = chunk.choices[0].delta.content
      print(content, end="")


Node.start(main)

Embeddings

Generate embeddings for semantic search and similarity:
images/main/main.py
from autonomy import Model, Node


async def main(node):
  model = Model("embed-english")

  embeddings = await model.embeddings([
    "Hello world",
    "How are you?"
  ])
  
  print(f"Embedding dimension: {len(embeddings[0])}")


Node.start(main)