v1.82.6 - gpt-5.4-mini, gpt-5.4-nano, Volcengine Doubao Seed 2.0, Multi-Proxy Control Plane
Deploy this version​
- Docker
- Pip
docker run litellm
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-1.82.6.rc.1
pip install litellm
pip install litellm==1.82.6.rc.1
Key Highlights​
- gpt-5.4-mini and gpt-5.4-nano — day 0 — Full pricing and routing support for
gpt-5.4-mini(272K context, $0.75/$4.50) andgpt-5.4-nano(1.05M context, $0.20/$1.25) on OpenAI and Azure - PR #23958 - Volcengine Doubao Seed 2.0 —
doubao-seed-2-0-pro,doubao-seed-2-0-lite,doubao-seed-2-0-mini, anddoubao-seed-2-0-code-previewadded with tiered pricing support - Multi-proxy worker control plane — New control plane for coordinating multiple proxy worker processes — centralized config, routing, and health management across workers - PR #24217
- Security: privilege escalation fix — Fixed privilege escalation on
/key/block,/key/unblock, and/key/updatemax_budget— non-admin users could previously modify keys they didn't own - PR #23781 - Anthropic reasoning summary opt-out — New
anthropic_reasoning_summaryflag to disable automatic injection of the default reasoning summary in Anthropic API responses - PR #22904 - Prompt management for Responses API — Prompt templates and versioning now work with the OpenAI Responses API - PR #23999
- Per-model-group deployment affinity — Router now supports sticky deployment routing per model group, reducing cold-start variance in production - PR #24110
New Models / Updated Models​
New Model Support (12 new models)​
| Provider | Model | Context Window | Input ($/1M tokens) | Output ($/1M tokens) | Features |
|---|---|---|---|---|---|
| OpenAI | gpt-5.4-mini | 272K | $0.75 | $4.50 | chat, vision, tools, reasoning, prompt caching |
| OpenAI | gpt-5.4-nano | 1.05M | $0.20 | $1.25 | chat, vision, tools, reasoning, prompt caching |
| Azure OpenAI | azure/gpt-5.4-mini | 272K | $0.75 | $4.50 | chat, vision, tools, reasoning |
| Azure OpenAI | azure/gpt-5.4-nano | 1.05M | $0.20 | $1.25 | chat, vision, tools, reasoning |
| OpenAI | gpt-4-0314 | 8K | $30.00 | $60.00 | chat (restored; deprecation 2026-03-26) |
| xAI | xai/grok-4.20-beta-0309-reasoning | 2M | $2.00 | $6.00 | chat, vision, tools, web search, reasoning |
| xAI | xai/grok-4.20-beta-0309-non-reasoning | 2M | $2.00 | $6.00 | chat, vision, tools, web search |
| xAI | xai/grok-4.20-multi-agent-beta-0309 | 2M | - | - | chat, vision, tools, web search |
| Volcengine | volcengine/doubao-seed-2-0-pro-260215 | 256K | tiered | tiered | chat, vision, reasoning |
| Volcengine | volcengine/doubao-seed-2-0-lite-260215 | 256K | tiered | tiered | chat, vision, reasoning |
| Volcengine | volcengine/doubao-seed-2-0-mini-260215 | 256K | tiered | tiered | chat, vision, reasoning |
| Volcengine | volcengine/doubao-seed-2-0-code-preview-260215 | 256K | tiered | tiered | chat, vision, reasoning |
Updated Models​
-
- Add
supports_minimal_reasoning_effortto entiregpt-5.xmodel series (gpt-5.1 through gpt-5.4, including codex, pro, nano, and mini variants) andazure/gpt-5.1-2025-11-13
- Add
-
- Add
supports_minimal_reasoning_efforttoxai/grok-beta
- Add
-
- Add Cohere Rerank 4.0 models (
azure_ai/cohere-rerank-v4,azure_ai/cohere-rerank-v4-multilingual) to model cost map - Add DeepSeek V3.2 models (
azure_ai/deepseek-v3.2,azure_ai/deepseek-v3.2-speciale) to model cost map
- Add Cohere Rerank 4.0 models (
-
- Correct
supported_regionsfor Vertex AI DeepSeek models - PR #23864
- Correct
Features​
-
- Context circulation support for server-side tool combination (Gemini native feature) - PR #24073
Bugs​
-
- Align
translate_thinking_for_modelwith default reasoning summary injection — fixes cases where summary was injected inconsistently - PR #22909 - Preserve cache directive on file-type content blocks — cache headers were dropped on file messages - PR #23906
- Fix
cache_controldirective dropped on document/file message blocks - PR #23911 - Filter beta header after transformation (not before) to prevent invalid header injection - PR #23715
- Add
additionalProperties: falsefor OpenAI strict mode in Anthropic adapter - PR #24072 - Fix thinking blocks dropped when
thinkingfield is null - PR #24070
- Align
-
- Fix streaming
finish_reason='stop'instead of'tool_calls'forgemini-3.1-flash-lite-preview- PR #23895 - Respect
vertex_count_tokens_locationfor Claudecount_tokenscalls on Vertex - PR #23907 - Pass model to context caching URL builder for custom
api_base- PR #23928 - Fix Vertex AI Batch output file download failing with 500 - PR #23718
- Fix streaming
-
- Preserve annotations in Bing Search grounding responses from Azure AI Agents - PR #23939
-
- Preserve diarization segments in transcription response —
segmentsfield was being dropped - PR #23925
- Preserve diarization segments in transcription response —
-
- Skip
#transform=inlinefor base64 data URLs — avoids double-encoding of inline image data - PR #23818
- Skip
-
- Respect
api_baseandaws_bedrock_runtime_endpointin thecount_tokensendpoint - PR #24199
- Respect
-
Responses API
- Map Chat Completion file type to Responses API
input_filecorrectly - PR #23618 - Surface Anthropic code execution results as
code_interpreter_callin Responses API output - PR #23784 - Capture incomplete terminal errors in background streaming — previously only
response.completedtriggered flush - PR #23881 - Align emulated file search Responses behavior with native output format - PR #23969
- Map Chat Completion file type to Responses API
-
General
- Map Anthropic
refusalfinish reason tocontent_filterfor OpenAI compatibility - PR #23899 - Preserve custom attributes on final stream chunk — were being dropped on the last SSE event - PR #23530
- Fix ensure alternating roles in message arrays - PR #24015
- Short-circuit web search interception for
github_copilotprovider - PR #24143 - Fix proxy-only failure call type not being set correctly - PR #24050
- Map Anthropic
LLM API Endpoints​
Features​
-
- Add create character endpoints and new video generation endpoints - PR #23737
-
- Prompt management support for Responses API — use prompt templates and versioning with
/v1/responses- PR #23999
- Prompt management support for Responses API — use prompt templates and versioning with
-
- Use
AZURE_DEFAULT_API_VERSIONenv var as default for proxy--api_versionflag - PR #24120
- Use
Bugs​
- General
- Fix logging for incomplete streaming responses and custom pricing on
/v1/messagesand/v1/responses- PR #24080
- Fix logging for incomplete streaming responses and custom pricing on
Management Endpoints / UI​
Features​
-
Multi-Proxy Control Plane
- New control plane for managing multiple proxy worker processes — centralized routing, config sync, and health tracking across workers - PR #24217
-
Audit Logs
- Export audit logs to external callback systems (S3, custom callbacks) - PR #23167
-
Teams
-
Virtual Keys
- Disable custom virtual key values via UI setting — prevent users from specifying their own key strings - PR #23812
-
Setup Wizard
- Interactive
litellm --setupwizard for configuring providers, API keys, and proxy settings from the CLI - PR #23644
- Interactive
Bugs​
- Fix empty filter results showing stale data in UI Logs view - PR #23792
- Fix internal users being able to create invalid keys - PR #23795
- Fix key alias re-validation on update blocking legacy aliases - PR #23798
- Fix per-entity breakdown missing from aggregated daily activity endpoint - PR #23471
- Fix
team_member_budget_durationmissing fromNewTeamRequest- PR #23484 - Fix CSV export empty on Global Usage page - PR #23819
- Fix DefaultInternalUserParams Pydantic default not matching runtime fallback - PR #23666
- Fix key update endpoint returning 401 instead of 404 for nonexistent keys - PR #24063
- Fix
/key/blockand/key/unblockreturning 404 (not 401) for non-existent keys - PR #23977 - Fix pass-through subpath auth for non-admin users - PR #24079
- Fix duplicate callback logs for pass-through endpoint failures - PR #23509
- Fix Default Team Settings missing permission options in UI - PR #24039
- Fix guardrail mode type crash on non-string values in Logs UI - PR #24035
- Fix create key tags dropdown - PR #24273
AI Integrations​
Logging​
-
- Fix OpenTelemetry traceparent propagation —
traceparentheader was not being forwarded correctly to Langfuse spans - PR #24048
- Fix OpenTelemetry traceparent propagation —
-
- Populate
usage_metadatain outputs for Cost column tracking - PR #24043
- Populate
-
- Export audit logs to external callback systems (S3, custom destinations) - PR #23167
-
General
Guardrails​
- Akto — New Akto guardrail integration for API security testing and threat detection - PR #23250
- MCP JWT Signer — Built-in guardrail for zero-trust MCP authentication — automatically signs outbound MCP requests with JWT tokens - PR #23897
pre_mcp_callheader mutation —pre_mcp_callguardrail hooks can now mutate outbound MCP request headers - PR #23889- Fix model-level guardrails not executing for non-streaming post_call — guardrails configured at the model level were silently skipped on synchronous (non-streaming) responses - PR #23774
- Defer logging until post-call guardrails complete — logging callbacks were firing before guardrail post_call hooks finished, causing incomplete log entries - PR #24135
Prompt Management​
- Responses API
- Prompt management (templates, versioning) now supported for
/v1/responses- PR #23999
- Prompt management (templates, versioning) now supported for
Secret Managers​
No major secret manager changes in this release.
MCP Gateway​
Bugs​
- Fix
oauth2_flownot being set when buildingMCPServerin_execute_with_mcp_client— caused MCP server auth failures for OAuth2-protected servers - PR #23468 - Upgrade
mcpSDK to 1.26.0 - PR #24179
Spend Tracking, Budgets and Rate Limiting​
- Proxy-wide default API key TPM/RPM limits — Set global default rate limits applied to all API keys that don't have explicit limits configured - PR #24088
- Fix rate limit check before creating polling ID — polling IDs were being created before the rate limit check, consuming slots even for rejected requests - PR #24106
Performance / Loadbalancing / Reliability improvements​
- Per-model-group deployment affinity — Router can now pin requests to specific deployments within a model group, reducing cold-start latency and improving cache hit rates for stateful workloads - PR #24110
- Auto-recover shared aiohttp session when closed — proxy was crashing with
RuntimeError: Session is closedafter idle periods; session now auto-recovers - PR #23808 - Kill orphaned Prisma engine subprocess on failed disconnect — zombie Prisma engine processes were accumulating on DB reconnect failures, exhausting file descriptors - PR #24149
- Add
IF NOT EXISTSto index creation in migration — migration was failing on re-runs if indexes already existed - PR #24105
Security​
- Fix privilege escalation on key management endpoints — non-admin users could call
/key/block,/key/unblock, and/key/updatewithmax_budgetto modify keys they don't own. Now enforces ownership checks - PR #23781 - Fix global secret redaction — secrets were not being redacted from all log paths; now uses root logger filter + key-name-based pattern matching to ensure full coverage - PR #24305
Documentation Updates​
- No major documentation-only changes in this release.
New Contributors​
- @Chesars made their first contribution in PR #21441
- @michelligabriele made their first contribution in PR #23471
- @voidborne-d made their first contribution in PR #23808
- @andrzej-pomirski-yohana made their first contribution in PR #23784
- @kelvin-tran made their first contribution in PR #23911
- @themavik made their first contribution in PR #24043
- @emerzon made their first contribution in PR #24044
- @jyeros made their first contribution in PR #24048
- @alilxxey made their first contribution in PR #24050
- @xr843 made their first contribution in PR #24070
- @ephrimstanley (Point72) made their first contribution in PR #24088
- @superpoussin22 made their first contribution in PR #24105
- @devin-petersohn made their first contribution in PR #24140
- @johnib made their first contribution in PR #24143
- @stias made their first contribution in PR #24199
- @milan-berri made their first contribution in PR #24220
Diff Summary​
03/23/2026​
- New Models / Updated Models: 12 new
- LLM API Endpoints: 4
- Management Endpoints / UI: 17
- Logging / Guardrail / Prompt Management Integrations: 9
- MCP Gateway: 2
- Spend Tracking, Budgets and Rate Limiting: 2
- Performance / Loadbalancing / Reliability improvements: 4
- Security: 2