Skip to main content

One post tagged with "opus 4.6"

View All Tags

Day 0 Support: Claude Opus 4.6

Sameer Kankute
SWE @ LiteLLM (LLM Translation)
Ishaan Jaff
CTO, LiteLLM
Krrish Dholakia
CEO, LiteLLM

LiteLLM now supports Claude Opus 4.6 on Day 0. Use it across Anthropic, Azure, Vertex AI, and Bedrock through the LiteLLM AI Gateway.

Docker Image​

docker pull ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6

Usage - Anthropic​

1. Setup config.yaml

model_list:
- model_name: claude-opus-4-6
litellm_params:
model: anthropic/claude-opus-4-6
api_key: os.environ/ANTHROPIC_API_KEY

2. Start the proxy

docker run -d \
-p 4000:4000 \
-e ANTHROPIC_API_KEY=$ANTHROPIC_API_KEY \
-v $(pwd)/config.yaml:/app/config.yaml \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \
--config /app/config.yaml

3. Test it!

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'

Usage - Azure​

1. Setup config.yaml

model_list:
- model_name: claude-opus-4-6
litellm_params:
model: azure_ai/claude-opus-4-6
api_key: os.environ/AZURE_AI_API_KEY
api_base: os.environ/AZURE_AI_API_BASE # https://<resource>.services.ai.azure.com

2. Start the proxy

docker run -d \
-p 4000:4000 \
-e AZURE_AI_API_KEY=$AZURE_AI_API_KEY \
-e AZURE_AI_API_BASE=$AZURE_AI_API_BASE \
-v $(pwd)/config.yaml:/app/config.yaml \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \
--config /app/config.yaml

3. Test it!

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'

Usage - Vertex AI​

1. Setup config.yaml

model_list:
- model_name: claude-opus-4-6
litellm_params:
model: vertex_ai/claude-opus-4-6
vertex_project: os.environ/VERTEX_PROJECT
vertex_location: us-east5

2. Start the proxy

docker run -d \
-p 4000:4000 \
-e VERTEX_PROJECT=$VERTEX_PROJECT \
-e GOOGLE_APPLICATION_CREDENTIALS=/app/credentials.json \
-v $(pwd)/config.yaml:/app/config.yaml \
-v $(pwd)/credentials.json:/app/credentials.json \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \
--config /app/config.yaml

3. Test it!

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'

Usage - Bedrock​

1. Setup config.yaml

model_list:
- model_name: claude-opus-4-6
litellm_params:
model: bedrock/anthropic.claude-opus-4-6-v1:0
aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID
aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY
aws_region_name: us-east-1

2. Start the proxy

docker run -d \
-p 4000:4000 \
-e AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
-e AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
-v $(pwd)/config.yaml:/app/config.yaml \
ghcr.io/berriai/litellm:litellm_stable_release_branch-v1.80.0-stable.opus-4-6 \
--config /app/config.yaml

3. Test it!

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "what llm are you"
}
]
}'

Advanced Features​

Compaction​

Litellm supports enabling compaction for the new claude-opus-4-6.

Enabling Compaction

To enable compaction, add the context_management parameter with the compact_20260112 edit type:

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "What is the weather in San Francisco?"
}
],
"context_management": {
"edits": [
{
"type": "compact_20260112"
}
]
},
"max_tokens": 100
}'

All the parameters supported for context_management by anthropic are supported and can be directly added. Litellm automatically adds the compact-2026-01-12 beta header in the request.

Response with Compaction Block

The response will include the compaction summary in provider_specific_fields.compaction_blocks:

{
"id": "chatcmpl-a6c105a3-4b25-419e-9551-c800633b6cb2",
"created": 1770357619,
"model": "claude-opus-4-6",
"object": "chat.completion",
"choices": [
{
"finish_reason": "length",
"index": 0,
"message": {
"content": "I don't have access to real-time data, so I can't provide the current weather in San Francisco. To get up-to-date weather information, I'd recommend checking:\n\n- **Weather websites** like weather.com, accuweather.com, or wunderground.com\n- **Search engines** – just Google \"San Francisco weather\"\n- **Weather apps** on your phone (e.g., Apple Weather, Google Weather)\n- **National",
"role": "assistant",
"provider_specific_fields": {
"compaction_blocks": [
{
"type": "compaction",
"content": "Summary of the conversation: The user requested help building a web scraper..."
}
]
}
}
}
],
"usage": {
"completion_tokens": 100,
"prompt_tokens": 86,
"total_tokens": 186
}
}

Using Compaction Blocks in Follow-up Requests

To continue the conversation with compaction, include the compaction block in the assistant message's provider_specific_fields:

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "How can I build a web scraper?"
},
{
"role": "assistant",
"content": [
{
"type": "text",
"text": "Certainly! To build a basic web scraper, you'll typically use a programming language like Python along with libraries such as `requests` (for fetching web pages) and `BeautifulSoup` (for parsing HTML). Here's a basic example:\n\n```python\nimport requests\nfrom bs4 import BeautifulSoup\n\nurl = 'https://example.com'\nresponse = requests.get(url)\nsoup = BeautifulSoup(response.text, 'html.parser')\n\n# Extract and print all text\ntext = soup.get_text()\nprint(text)\n```\n\nLet me know what you're interested in scraping or if you need help with a specific website!"
}
],
"provider_specific_fields": {
"compaction_blocks": [
{
"type": "compaction",
"content": "Summary of the conversation: The user asked how to build a web scraper, and the assistant gave an overview using Python with requests and BeautifulSoup."
}
]
}
},
{
"role": "user",
"content": "How do I use it to scrape product prices?"
}
],
"context_management": {
"edits": [
{
"type": "compact_20260112"
}
]
},
"max_tokens": 100
}'

Streaming Support

Compaction blocks are also supported in streaming mode. You'll receive:

  • compaction_start event when a compaction block begins
  • compaction_delta events with the compaction content
  • The accumulated compaction_blocks in provider_specific_fields

Adaptive Thinking​

LiteLLM supports adaptive thinking through the reasoning_effort parameter:

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "Solve this complex problem: What is the optimal strategy for..."
}
],
"reasoning_effort": "high"
}'

Effort Levels​

Four effort levels available: low, medium, high (default), and max. Pass directly via the output_config parameter:

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "Explain quantum computing"
}
],
"output_config": {
"effort": "medium"
}
}'

You can use reasoning effort plus output_config to have more control on the model.

1M Token Context (Beta)​

Opus 4.6 supports 1M token context. Premium pricing applies for prompts exceeding 200k tokens ($10/$37.50 per million input/output tokens). LiteLLM supports cost calculations for 1M token contexts.

To use the 1M token context window, you need to forward the anthropic-beta header from your client to the LLM provider.

Step 1: Enable header forwarding in your config

general_settings:
forward_client_headers_to_llm_api: true

Step 2: Send requests with the beta header

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--header 'anthropic-beta: context-1m-2025-08-07' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "Analyze this large document..."
}
]
}'

US-Only Inference​

Available at 1.1× token pricing. LiteLLM automatically tracks costs for US-only inference.

Use the inference_geo parameter to specify US-only inference:

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
],
"inference_geo": "us"
}'

LiteLLM will automatically apply the 1.1× pricing multiplier for US-only inference in cost tracking.

Fast Mode​

info

Fast mode is only supported on the Anthropic provider (anthropic/claude-opus-4-6). It is not available on Azure AI, Vertex AI, or Bedrock.

Pricing:

  • Standard: $5 input / $25 output per MTok
  • Fast: $30 input / $150 output per MTok (6× premium)
curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer $LITELLM_KEY' \
--data '{
"model": "claude-opus-4-6",
"messages": [
{
"role": "user",
"content": "Refactor this module..."
}
],
"max_tokens": 4096,
"speed": "fast"
}'

Using OpenAI SDK:

import openai

client = openai.OpenAI(
api_key="your-litellm-key",
base_url="http://0.0.0.0:4000"
)

response = client.chat.completions.create(
model="claude-opus-4-6",
messages=[{"role": "user", "content": "Refactor this module..."}],
max_tokens=4096,
extra_body={"speed": "fast"}
)

Using LiteLLM SDK:

from litellm import completion

response = completion(
model="anthropic/claude-opus-4-6",
messages=[{"role": "user", "content": "Refactor this module..."}],
max_tokens=4096,
speed="fast"
)

LiteLLM automatically tracks the higher costs for fast mode in usage and cost calculations.