v1.82.6 - gpt-5.4-mini, gpt-5.4-nano, Volcengine Doubao Seed 2.0, Multi-Proxy Control Plane

Deploy this version

Docker
Pip

docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-1.82.6.rc.1

pip install litellm
pip install litellm==1.82.6.rc.1

Key Highlights

gpt-5.4-mini and gpt-5.4-nano — day 0 — Full pricing and routing support for gpt-5.4-mini (272K context, $0.75/$4.50) and gpt-5.4-nano (1.05M context, $0.20/$1.25) on OpenAI and Azure - PR #23958
Volcengine Doubao Seed 2.0 — doubao-seed-2-0-pro, doubao-seed-2-0-lite, doubao-seed-2-0-mini, and doubao-seed-2-0-code-preview added with tiered pricing support
Multi-proxy worker control plane — New control plane for coordinating multiple proxy worker processes — centralized config, routing, and health management across workers - PR #24217
Security: privilege escalation fix — Fixed privilege escalation on /key/block, /key/unblock, and /key/update max_budget — non-admin users could previously modify keys they didn't own - PR #23781
Anthropic reasoning summary opt-out — New anthropic_reasoning_summary flag to disable automatic injection of the default reasoning summary in Anthropic API responses - PR #22904
Prompt management for Responses API — Prompt templates and versioning now work with the OpenAI Responses API - PR #23999
Per-model-group deployment affinity — Router now supports sticky deployment routing per model group, reducing cold-start variance in production - PR #24110

New Models / Updated Models

New Model Support (12 new models)

Provider	Model	Context Window	Input ($/1M tokens)	Output ($/1M tokens)	Features
OpenAI	`gpt-5.4-mini`	272K	$0.75	$4.50	chat, vision, tools, reasoning, prompt caching
OpenAI	`gpt-5.4-nano`	1.05M	$0.20	$1.25	chat, vision, tools, reasoning, prompt caching
Azure OpenAI	`azure/gpt-5.4-mini`	272K	$0.75	$4.50	chat, vision, tools, reasoning
Azure OpenAI	`azure/gpt-5.4-nano`	1.05M	$0.20	$1.25	chat, vision, tools, reasoning
OpenAI	`gpt-4-0314`	8K	$30.00	$60.00	chat (restored; deprecation 2026-03-26)
xAI	`xai/grok-4.20-beta-0309-reasoning`	2M	$2.00	$6.00	chat, vision, tools, web search, reasoning
xAI	`xai/grok-4.20-beta-0309-non-reasoning`	2M	$2.00	$6.00	chat, vision, tools, web search
xAI	`xai/grok-4.20-multi-agent-beta-0309`	2M	-	-	chat, vision, tools, web search
Volcengine	`volcengine/doubao-seed-2-0-pro-260215`	256K	tiered	tiered	chat, vision, reasoning
Volcengine	`volcengine/doubao-seed-2-0-lite-260215`	256K	tiered	tiered	chat, vision, reasoning
Volcengine	`volcengine/doubao-seed-2-0-mini-260215`	256K	tiered	tiered	chat, vision, reasoning
Volcengine	`volcengine/doubao-seed-2-0-code-preview-260215`	256K	tiered	tiered	chat, vision, reasoning

Updated Models

OpenAI
- Add supports_minimal_reasoning_effort to entire gpt-5.x model series (gpt-5.1 through gpt-5.4, including codex, pro, nano, and mini variants) and azure/gpt-5.1-2025-11-13
xAI
- Add supports_minimal_reasoning_effort to xai/grok-beta
Azure AI
- Add Cohere Rerank 4.0 models (azure_ai/cohere-rerank-v4, azure_ai/cohere-rerank-v4-multilingual) to model cost map
- Add DeepSeek V3.2 models (azure_ai/deepseek-v3.2, azure_ai/deepseek-v3.2-speciale) to model cost map
Google Vertex AI
- Correct supported_regions for Vertex AI DeepSeek models - PR #23864

Features

OpenAI
- Day 0 support for gpt-5.4-mini and gpt-5.4-nano on OpenAI and Azure - PR #23958
- Auto-route gpt-5.4+ calls using both tools and reasoning to the Responses API on Azure - PR #23926
Anthropic
- Opt-out flag for default reasoning summary injection (anthropic_reasoning_summary: false) - PR #22904
- Support ANTHROPIC_AUTH_TOKEN and ANTHROPIC_BASE_URL environment variables as alternative to ANTHROPIC_API_KEY - PR #24140
Google Gemini
- Context circulation support for server-side tool combination (Gemini native feature) - PR #24073
AWS Bedrock
- Support cache_control_injection_points for tool_config location in Bedrock requests - PR #24076
- Support batch cancel via Vertex AI / Bedrock batch API - PR #23957

Bugs

Anthropic
- Align translate_thinking_for_model with default reasoning summary injection — fixes cases where summary was injected inconsistently - PR #22909
- Preserve cache directive on file-type content blocks — cache headers were dropped on file messages - PR #23906
- Fix cache_control directive dropped on document/file message blocks - PR #23911
- Filter beta header after transformation (not before) to prevent invalid header injection - PR #23715
- Add additionalProperties: false for OpenAI strict mode in Anthropic adapter - PR #24072
- Fix thinking blocks dropped when thinking field is null - PR #24070
Google Vertex AI
- Fix streaming finish_reason='stop' instead of 'tool_calls' for gemini-3.1-flash-lite-preview - PR #23895
- Respect vertex_count_tokens_location for Claude count_tokens calls on Vertex - PR #23907
- Pass model to context caching URL builder for custom api_base - PR #23928
- Fix Vertex AI Batch output file download failing with 500 - PR #23718
Azure AI
- Preserve annotations in Bing Search grounding responses from Azure AI Agents - PR #23939
Mistral
- Preserve diarization segments in transcription response — segments field was being dropped - PR #23925
Fireworks AI
- Skip #transform=inline for base64 data URLs — avoids double-encoding of inline image data - PR #23818
AWS Bedrock
- Respect api_base and aws_bedrock_runtime_endpoint in the count_tokens endpoint - PR #24199
Responses API
- Map Chat Completion file type to Responses API input_file correctly - PR #23618
- Surface Anthropic code execution results as code_interpreter_call in Responses API output - PR #23784
- Capture incomplete terminal errors in background streaming — previously only response.completed triggered flush - PR #23881
- Align emulated file search Responses behavior with native output format - PR #23969
General
- Map Anthropic refusal finish reason to content_filter for OpenAI compatibility - PR #23899
- Preserve custom attributes on final stream chunk — were being dropped on the last SSE event - PR #23530
- Fix ensure alternating roles in message arrays - PR #24015
- Short-circuit web search interception for github_copilot provider - PR #24143
- Fix proxy-only failure call type not being set correctly - PR #24050

LLM API Endpoints

Features

Video Generation API
- Add create character endpoints and new video generation endpoints - PR #23737
Responses API
- Prompt management support for Responses API — use prompt templates and versioning with /v1/responses - PR #23999
Azure
- Use AZURE_DEFAULT_API_VERSION env var as default for proxy --api_version flag - PR #24120

Bugs

General
- Fix logging for incomplete streaming responses and custom pricing on /v1/messages and /v1/responses - PR #24080

Management Endpoints / UI

Features

Multi-Proxy Control Plane
- New control plane for managing multiple proxy worker processes — centralized routing, config sync, and health tracking across workers - PR #24217
Audit Logs
- Export audit logs to external callback systems (S3, custom callbacks) - PR #23167
Teams
- /v2/team/list — new endpoint with org admin access control, members_count, and DB indexes for performance - PR #23938
- Modernize Teams Table in UI — antd-based redesign with table refresh, infinite scroll dropdown, and leftnav migration - PR #24189, PR #24342
Virtual Keys
- Disable custom virtual key values via UI setting — prevent users from specifying their own key strings - PR #23812
Setup Wizard
- Interactive litellm --setup wizard for configuring providers, API keys, and proxy settings from the CLI - PR #23644

Bugs

Fix empty filter results showing stale data in UI Logs view - PR #23792
Fix internal users being able to create invalid keys - PR #23795
Fix key alias re-validation on update blocking legacy aliases - PR #23798
Fix per-entity breakdown missing from aggregated daily activity endpoint - PR #23471
Fix team_member_budget_duration missing from NewTeamRequest - PR #23484
Fix CSV export empty on Global Usage page - PR #23819
Fix DefaultInternalUserParams Pydantic default not matching runtime fallback - PR #23666
Fix key update endpoint returning 401 instead of 404 for nonexistent keys - PR #24063
Fix /key/block and /key/unblock returning 404 (not 401) for non-existent keys - PR #23977
Fix pass-through subpath auth for non-admin users - PR #24079
Fix duplicate callback logs for pass-through endpoint failures - PR #23509
Fix Default Team Settings missing permission options in UI - PR #24039
Fix guardrail mode type crash on non-string values in Logs UI - PR #24035
Fix create key tags dropdown - PR #24273

AI Integrations

Logging

Langfuse
- Fix OpenTelemetry traceparent propagation — traceparent header was not being forwarded correctly to Langfuse spans - PR #24048
LangSmith
- Populate usage_metadata in outputs for Cost column tracking - PR #24043
Audit Log Export
- Export audit logs to external callback systems (S3, custom destinations) - PR #23167
General
- Preserve router_model_group in generic API log entries - PR #24044
- Merge hidden_params into metadata for streaming requests — previously only non-streaming requests had full metadata - PR #24220

Guardrails

Akto — New Akto guardrail integration for API security testing and threat detection - PR #23250
MCP JWT Signer — Built-in guardrail for zero-trust MCP authentication — automatically signs outbound MCP requests with JWT tokens - PR #23897
pre_mcp_call header mutation — pre_mcp_call guardrail hooks can now mutate outbound MCP request headers - PR #23889
Fix model-level guardrails not executing for non-streaming post_call — guardrails configured at the model level were silently skipped on synchronous (non-streaming) responses - PR #23774
Defer logging until post-call guardrails complete — logging callbacks were firing before guardrail post_call hooks finished, causing incomplete log entries - PR #24135

Prompt Management

Responses API
- Prompt management (templates, versioning) now supported for /v1/responses - PR #23999

Secret Managers

No major secret manager changes in this release.

MCP Gateway

Bugs

Fix oauth2_flow not being set when building MCPServer in _execute_with_mcp_client — caused MCP server auth failures for OAuth2-protected servers - PR #23468
Upgrade mcp SDK to 1.26.0 - PR #24179

Spend Tracking, Budgets and Rate Limiting

Proxy-wide default API key TPM/RPM limits — Set global default rate limits applied to all API keys that don't have explicit limits configured - PR #24088
Fix rate limit check before creating polling ID — polling IDs were being created before the rate limit check, consuming slots even for rejected requests - PR #24106

Performance / Loadbalancing / Reliability improvements

Per-model-group deployment affinity — Router can now pin requests to specific deployments within a model group, reducing cold-start latency and improving cache hit rates for stateful workloads - PR #24110
Auto-recover shared aiohttp session when closed — proxy was crashing with RuntimeError: Session is closed after idle periods; session now auto-recovers - PR #23808
Kill orphaned Prisma engine subprocess on failed disconnect — zombie Prisma engine processes were accumulating on DB reconnect failures, exhausting file descriptors - PR #24149
Add IF NOT EXISTS to index creation in migration — migration was failing on re-runs if indexes already existed - PR #24105

Security

Fix privilege escalation on key management endpoints — non-admin users could call /key/block, /key/unblock, and /key/update with max_budget to modify keys they don't own. Now enforces ownership checks - PR #23781
Fix global secret redaction — secrets were not being redacted from all log paths; now uses root logger filter + key-name-based pattern matching to ensure full coverage - PR #24305

Documentation Updates

No major documentation-only changes in this release.

New Contributors

@Chesars made their first contribution in PR #21441
@michelligabriele made their first contribution in PR #23471
@voidborne-d made their first contribution in PR #23808
@andrzej-pomirski-yohana made their first contribution in PR #23784
@kelvin-tran made their first contribution in PR #23911
@themavik made their first contribution in PR #24043
@emerzon made their first contribution in PR #24044
@jyeros made their first contribution in PR #24048
@alilxxey made their first contribution in PR #24050
@xr843 made their first contribution in PR #24070
@ephrimstanley (Point72) made their first contribution in PR #24088
@superpoussin22 made their first contribution in PR #24105
@devin-petersohn made their first contribution in PR #24140
@johnib made their first contribution in PR #24143
@stias made their first contribution in PR #24199
@milan-berri made their first contribution in PR #24220

Diff Summary

03/23/2026

New Models / Updated Models: 12 new
LLM API Endpoints: 4
Management Endpoints / UI: 17
Logging / Guardrail / Prompt Management Integrations: 9
MCP Gateway: 2
Spend Tracking, Budgets and Rate Limiting: 2
Performance / Loadbalancing / Reliability improvements: 4
Security: 2

Full Changelog

v1.82.3-stable...v1.82.6.rc.1

Deploy this version​

Key Highlights​

New Models / Updated Models​

New Model Support (12 new models)​

Updated Models​

Features​

Bugs​

LLM API Endpoints​

Features​

Bugs​

Management Endpoints / UI​

Features​

Bugs​

AI Integrations​

Logging​

Guardrails​

Prompt Management​

Secret Managers​

MCP Gateway​

Bugs​

Spend Tracking, Budgets and Rate Limiting​

Performance / Loadbalancing / Reliability improvements​

Security​

Documentation Updates​

New Contributors​

Diff Summary​

03/23/2026​

Full Changelog​

Deploy this version

Key Highlights

New Models / Updated Models

New Model Support (12 new models)

Updated Models

Features

Bugs

LLM API Endpoints

Features

Bugs

Management Endpoints / UI

Features

Bugs

AI Integrations

Logging

Guardrails

Prompt Management

Secret Managers

MCP Gateway

Bugs

Spend Tracking, Budgets and Rate Limiting

Performance / Loadbalancing / Reliability improvements

Security

Documentation Updates

New Contributors

Diff Summary

03/23/2026

Full Changelog