feat(router): add ttft_timeout to detect hung providers on non-streaming calls by TheCodeWrangler · Pull Request #30337 · BerriAI/litellm

TheCodeWrangler · 2026-06-13T01:45:22Z

feat(router): add ttft_timeout and stream_idle_timeout to detect hung and stalling providers

Relevant issues

Pre-Submission checklist

I have Added testing in the tests/litellm/ directory, Adding at least 1 test is a hard requirement - see details
I have added a screenshot of my new test passing locally

My PR passes all unit tests on make test-unit
My PR's scope is as isolated as possible, it only solves 1 specific problem

Type

New Feature

Changes

Adds two new Router parameters for detecting providers that accept connections but then fail to deliver:

ttft_timeout: float — fires litellm.Timeout if no first token arrives within N seconds of the connection being accepted (catches hung providers before any content is sent)
stream_idle_timeout: float — fires litellm.Timeout if no chunk arrives within N seconds between consecutive tokens (catches providers that stall mid-stream after delivering some content)

Both parameters are independent; either or both can be set. When set, non-streaming calls (stream=False) are internally promoted to stream=True so the router has visibility into token timing. The caller always receives a standard ModelResponse reconstructed via stream_chunk_builder.

Why this matters

Deployments with large request timeouts (e.g. 120s for long generation tasks) have no way to distinguish a legitimately slow provider from one that hangs or stalls. Without this, a single degraded provider blocks users for the full timeout before cooldown/fallback kicks in. With ttft_timeout=10 and stream_idle_timeout=30, the worst-case user wait is bounded regardless of the configured timeout.

How it works

If either timeout is set and the caller uses stream=False, _acompletion internally overrides stream=True
Phase 1 (ttft_timeout): a single hard deadline (not a per-chunk reset) so preamble chunks (role deltas, empty tool-call deltas) do not extend the budget. First-token detection covers both delta.content and delta.tool_calls
Phase 2 (stream_idle_timeout): per-chunk asyncio.wait_for wraps each __anext__ call; if any inter-token gap exceeds the limit, raises litellm.Timeout
The reconstructed response goes through _should_raise_content_policy_error, matching the existing non-streaming path

Per-deployment config (preferred for heterogeneous deployments)

Both parameters follow the same resolution chain as stream_timeout: per-request kwarg -> per-deployment litellm_params -> router-level -> default_litellm_params.

router = Router(
    model_list=[
        {
            "model_name": "my-model",
            "litellm_params": {
                "model": "anthropic/claude-sonnet-4-6",
                "ttft_timeout": 8.0,
                "stream_idle_timeout": 30.0,
            },
        },
        {
            "model_name": "my-model",
            "litellm_params": {
                "model": "openai/gpt-4o",
                "ttft_timeout": 15.0,
                "stream_idle_timeout": 30.0,
            },
        },
    ],
    timeout=120,
    allowed_fails=1,
    fallbacks=[{"my-model": ["my-model"]}],
)

Performance considerations

Enabling either timeout changes the non-streaming path from O(1) to O(tokens): instead of one HTTP response body and one JSON parse, the router creates one Python object per streaming chunk and runs stream_chunk_builder to reconstruct. For a 500-token response this is ~500 small short-lived objects vs. 1; at low-to-moderate throughput the difference is negligible, but at high throughput with large responses the GC pressure is measurable.

The tradeoff is worthwhile when ttft_timeout / stream_idle_timeout are much smaller than timeout. If timeout is already short (e.g. 10s), the native timeout fires quickly enough that these parameters add overhead without meaningfully improving UX. A reasonable rule: only set these on deployments where timeout is long enough that a hang or stall would be visibly bad for users.

stream_idle_timeout adds O(tokens) asyncio.wait_for call overhead on top of the buffering cost — one coroutine wrap per chunk. In practice this is ~1-2 µs per chunk and dwarfed by network I/O, but it is worth knowing the ceiling.

Concurrency, cleanup, and failover

Because a stream=False caller is promoted to streaming internally, the drain has to preserve the guarantees that caller already had.

The max_parallel_requests semaphore is now held for the full reconstruction, not just for opening the stream. Previously the async with rpm_semaphore block exited as soon as the CustomStreamWrapper was returned, so the entire drain (the slow part) ran with the slot already released; a stream=False caller could therefore exceed its configured concurrency. Reconstruction now runs inside the semaphore

The drain runs under try/finally and calls response.aclose() on timeout, on caller cancellation, and on normal completion, so a client disconnect mid-reconstruction releases the upstream connection instead of leaking it

A ttft_timeout / stream_idle_timeout failure raises litellm.Timeout, which already flows into the existing retry, cooldown, and fallback machinery and is tagged with failed_deployment_id. With enable_weighted_failover=True on a simple-shuffle group, that tag lets the in-request re-pick exclude the hung deployment instead of re-selecting a high-weight bad host and burning the retry budget

Safe defaults

Both parameters default to off (None); nothing changes for callers who do not set them. They are best set per deployment rather than globally, since a value tuned for a fast chat model will abandon legitimate reasoning calls that have a large time-to-first-token and longer inter-token gaps. Treat stream_idle_timeout as a freeze detector and keep it well above the model's per-token p99, not as a slowness detector

Files changed

litellm/router.py — ttft_timeout and stream_idle_timeout params on __init__; _get_ttft_timeout and _get_stream_idle_timeout helpers; _collect_stream_with_ttft_timeout (both phases, drained under try/finally with aclose); _acompletion drains and reconstructs inside the concurrency semaphore via _await_response and routes the reconstructed ModelResponse through the shared content-policy and metrics path
litellm/types/utils.py — adds ttft_timeout and stream_idle_timeout to all_litellm_params (alongside stream_timeout) so they are stripped from the request before it reaches the provider and never forwarded as unknown fields
tests/test_litellm/test_router.py — happy path, hung provider, preamble-chunk deadline, empty stream, _acompletion intercept, stalled mid-stream, non-stalling idle timeout, both timeouts active together, semaphore held through reconstruction, stream closed on caller cancellation, a ttft litellm.Timeout tagging failed_deployment_id, and the resolution-chain precedence
tests/test_litellm/test_filter_out_litellm_params.py — asserts both params are filtered out of provider-bound kwargs while genuine provider params are kept

A companion docs PR (BerriAI/litellm-docs#353) adds the two params to the router_settings reference table; the documentation gate reads that file from the litellm-docs repo, so it goes green here once that merges

greptile-apps · 2026-06-13T01:48:50Z

Greptile Summary

Adds ttft_timeout and stream_idle_timeout to the Router for detecting hung and stalling providers on non-streaming calls. When either is set, _acompletion internally promotes stream=False to stream=True, drains and reconstructs the stream into a ModelResponse via stream_chunk_builder, holding the max_parallel_requests semaphore throughout.

Phase 1 (ttft_timeout): a single absolute deadline computed before the loop; preamble chunks (empty deltas, role-only chunks) do not reset the budget, and first-token detection covers both delta.content and delta.tool_calls.
Phase 2 (stream_idle_timeout): per-chunk asyncio.wait_for wraps each __anext__ call starting only after Phase 1 delivers a content-bearing chunk, so a slow first token with only stream_idle_timeout set does not spuriously time out.
Resolution chain (per-request kwarg → per-deployment litellm_params → router-level → default_litellm_params) uses explicit is None guards throughout; aclose() is called in a finally block on timeout, cancellation, and normal completion; and litellm.Timeout propagates through the existing retry/cooldown machinery with failed_deployment_id stamped by the existing except litellm.Timeout handler in _acompletion.

Confidence Score: 5/5

Safe to merge. All behavior changes are opt-in (both parameters default to None), the existing non-streaming path is unchanged for callers who do not set them, and every previously raised concern has been correctly addressed in this revision.

Every issue flagged in prior review threads — absolute TTFT deadline, tool-call detection, explicit is None resolution chain, Phase 2 error propagation, stream_idle_timeout not firing before first content, semaphore held through reconstruction, and aclose() on all exit paths — is verifiably fixed in the current code. The new test test_router_stream_idle_timeout_does_not_fire_before_first_token directly validates the trickiest interaction (slow first token with only stream_idle_timeout set). No real network calls are introduced in the test suite.

No files require special attention.

Important Files Changed

Filename	Overview
litellm/router.py	Adds `_collect_stream_with_ttft_timeout`, `_get_ttft_timeout`, `_get_stream_idle_timeout`, and wires them into `_acompletion`. All previously flagged issues (absolute deadline, tool-call detection, `is None` resolution chain, Phase 2 error propagation, stream_idle_timeout before first token, semaphore held through reconstruction) are correctly addressed in this revision.
litellm/types/utils.py	Adds `ttft_timeout` and `stream_idle_timeout` to `all_litellm_params` so they are stripped before the request reaches upstream providers. Correct and minimal change.
tests/test_litellm/test_router.py	Adds 12 new mock-only tests covering happy path, hung provider, preamble deadline, empty stream, `_acompletion` intercept, stalled mid-stream, slow-first-token (idle only), both timeouts, semaphore contract, caller cancellation, `failed_deployment_id` tagging, and resolution-chain precedence. No real network calls.
tests/test_litellm/test_filter_out_litellm_params.py	Adds a test asserting both new params are filtered out while genuine provider params are kept. Correct and targeted.

_{Reviews (7): Last reviewed commit: "refactor(router): drop redundant ttft de..." | Re-trigger Greptile}

TheCodeWrangler · 2026-06-13T01:57:06Z

Thanks for the detailed review @greptile-apps. All three issues were addressed in the follow-up commit (c0075e3):

1. Per-chunk timeout reset
Replaced the per-chunk asyncio.wait_for(aiter.__anext__(), ttft_timeout) pattern with a single hard deadline:

loop = asyncio.get_running_loop()
deadline = loop.time() + ttft_timeout
# ...
remaining = deadline - loop.time()
if remaining <= 0:
    raise litellm.Timeout(...)
chunk = await asyncio.wait_for(aiter.__anext__(), timeout=remaining)

Preamble chunks (role deltas, empty tool-call deltas) no longer reset the clock — the full ttft_timeout budget is shared across all chunks until the first real token arrives. A new test test_router_ttft_timeout_not_reset_by_preamble_chunks covers this case.

2. Tool-call responses never marked as first token
The first-token check now covers both content and tool-call deltas:

delta = chunk.choices[0].delta if chunk.choices else None
if delta and (delta.content or delta.tool_calls):
    first_token_received = True

3. Silent mid-stream error swallowing in Phase 2
Removed the bare except Exception: pass. Phase 2 now lets real errors propagate directly to the Router's fallback machinery:

# Phase 2: let real errors propagate
async for chunk in aiter:
    chunks.append(chunk)

Also switched asyncio.get_event_loop() to asyncio.get_running_loop() (correct for async context) and added verbose_router_logger.warning at both timeout paths for observability.

greptile-apps · 2026-06-13T01:58:12Z

The three issues are all cleanly addressed. A quick walkthrough of what I see in c0075e3:

Deadline-based clock — correct. The loop.time() + ttft_timeout deadline is computed once and remaining is decremented on every wait_for call, so preamble chunks do not extend the budget. The new test validates this path.

Tool-call first-token detection — correct. delta.tool_calls is now included in the guard, and the test helper explicitly sets chunk.choices[0].delta.tool_calls = None to avoid MagicMock's implicit truthiness masking the check.

Phase 2 error propagation — correct. The bare except Exception: pass is gone; real errors surface to the Router's fallback machinery.

One minor point worth noting for a follow-up: _get_ttft_timeout uses or-chaining on numeric values:

return (
    kwargs.get("ttft_timeout", None)
    or data.get("ttft_timeout", None)
    or self.ttft_timeout
    ...
)

A caller passing ttft_timeout=0.0 would have it silently skipped (falsy) and the next source in the chain used instead. In _acompletion the guard is correctly is not None, so this only causes an inconsistency if someone genuinely passes 0.0 (which is pathological). Still, aligning it to use explicit is not None checks would be more robust:

for source in (kwargs.get("ttft_timeout"), data.get("ttft_timeout"), self.ttft_timeout, self.default_litellm_params.get("ttft_timeout")):
    if source is not None:
        return source
return None

This is a minor polish item rather than a blocker. The core logic of the feature is sound — the deadline approach, tool-call handling, and error propagation are all solid.

Tip: You can customize Greptile's behavior for this repo with .greptile/rules.md and .greptile/config.json.

veria-ai · 2026-06-13T02:05:33Z

PR overview

All previously flagged issues have been addressed. No open security concerns remain on this pull request.

Security review

No open security issues remain on this pull request.

Fixed/addressed: 3 · PR risk: 0/10

CLAassistant · 2026-06-13T02:26:15Z

All committers have signed the CLA.

TheCodeWrangler · 2026-06-13T02:37:10Z

Good catch on both points, addressed in 29e6de0.

On the or-chain: _get_ttft_timeout now iterates sources with explicit is not None checks so ttft_timeout=0.0 is honored rather than silently skipped.

On the content-policy bypass: the reconstructed ModelResponse from _collect_stream_with_ttft_timeout now goes through _should_raise_content_policy_error before being returned, identical to the existing check on the native non-streaming path.

…ing calls Adds ttft_timeout parameter to Router. When set, non-streaming calls internally switch to stream=True so the router can detect a hung provider (one that accepts the connection but never sends tokens) within ttft_timeout seconds, rather than waiting for the full request timeout which can be very long for large generation requests. Raises litellm.Timeout to trigger existing cooldown and fallback machinery. Caller always receives a standard ModelResponse via stream_chunk_builder. Uses a single hard deadline rather than per-chunk wait_for, so preamble chunks (role deltas, empty tool-call deltas) do not reset the clock. Checks both delta.content and delta.tool_calls for first-token detection. Phase 2 lets real errors propagate rather than swallowing them. Uses asyncio.get_running_loop().

_get_ttft_timeout used or-chaining which would skip a caller-supplied ttft_timeout=0.0 as falsy. Replaced with explicit is not None iteration. The ttft_timeout streaming path bypassed the content-policy violation check that runs for native non-streaming responses. The reconstructed ModelResponse now goes through _should_raise_content_policy_error before being returned, matching the existing non-streaming behavior.

TheCodeWrangler · 2026-06-13T11:11:08Z

The README.md concern is pre-existing content our PR does not touch. The only files changed here are litellm/router.py and tests/test_litellm/test_router.py. Happy to file a separate issue for the README TLS guidance if that's useful, but it's out of scope for this PR.

codecov · 2026-06-13T11:13:41Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Add tests for the empty-stream path (StopAsyncIteration -> APIError) and the _acompletion intercept path that forces stream=True internally when ttft_timeout is set.

…stream Extends the ttft_timeout feature with stream_idle_timeout: a per-chunk inter-token deadline that fires litellm.Timeout when a provider accepts a connection, sends some tokens, then goes silent. Both parameters are independent; either or both can be set at router or per-deployment level.

…spatch Add _acompletion intercept test for stream_idle_timeout-only config (ttft_timeout=None) and assert both params are forwarded correctly to _collect_stream_with_ttft_timeout. Also tighten the existing ttft_timeout intercept test to assert the forwarded values explicitly.

TheCodeWrangler · 2026-06-13T16:03:56Z

@greptileai

…nd close stream on exit The ttft_timeout / stream_idle_timeout path promotes a stream=False call to streaming and drains it via _collect_stream_with_ttft_timeout. Two correctness issues are addressed: - max_parallel_requests semaphore: reconstruction previously ran after the 'async with rpm_semaphore' block had already exited, so a stream=False caller could exceed the configured concurrency for the entire drain. Reconstruction now happens inside the semaphore via _await_response, restoring the non-streaming concurrency guarantee. - Connection cleanup: the drain loop now runs under try/finally and calls response.aclose() on timeout, cancellation, or normal completion, so a caller disconnect mid-reconstruction releases the upstream connection instead of leaking it. The reconstructed ModelResponse now flows through the shared content-policy check and _track_deployment_metrics, removing the duplicated content-policy block. The two identical ttft raise sites are collapsed into one. Tests: semaphore-held-through-reconstruction (fails before the fix), stream-closed-on-caller-cancellation, and a ttft Timeout tagging failed_deployment_id so weighted failover / cooldown can exclude the hung deployment on retry.

Adds a regression test pinning the resolution precedence (per-request kwarg > per-deployment litellm_params > router-level > default_litellm_params) for both _get_ttft_timeout and _get_stream_idle_timeout, which the router_code_coverage gate flagged as untested.

Sameerlite · 2026-06-16T03:44:30Z

Thanks for the contribution!

A couple of things to address before this is ready for merge:

It looks like some CI checks are failing — could you take a look and fix them, or let us know if you believe the failures are unrelated to this change?

We're also triggering a Greptile code review in the meantime.

@greptileai

…r API When set in a deployment's litellm_params, ttft_timeout and stream_idle_timeout were assembled into input_kwargs and forwarded to litellm.acompletion. Because neither key was in all_litellm_params, the provider param filtering treated them as model-specific extras and passed them to the upstream API, which 400s on unknown fields; this broke the exact per-deployment configuration the feature documents. Add both keys to all_litellm_params alongside stream_timeout so filter_out_litellm_params strips them before the provider call, while the router still resolves their values from litellm_params. Regression test asserts they are filtered out while genuine provider params (temperature) are kept.

TheCodeWrangler · 2026-06-16T12:28:28Z

@Sameerlite Thanks for the review.

On the CI failures: they all come from the router_settings documentation gate, which reads config_settings.md out of the separate BerriAI/litellm-docs repo rather than this one. The two new router params need reference-table rows there, so I opened a companion PR at BerriAI/litellm-docs#353 that adds them; I validated it locally by running tests/documentation_tests/test_router_settings.py against the edited file. Once #353 merges, both the documentation check and the documentation_test_router_settings step inside code-quality go green on the next run here. The other half of the code-quality failure was the router_code_coverage gate flagging the two new getters as untested, which is already fixed on this branch by the resolution-chain test in 15bd7fb

On Greptile's blocker: it correctly caught that ttft_timeout and stream_idle_timeout were being forwarded to the upstream provider when set in a deployment's litellm_params, which would 400. Fixed in f45f71b by adding both keys to all_litellm_params alongside stream_timeout, with a regression test in test_filter_out_litellm_params.py. The minor or-chaining note it raised was already handled; the getters use explicit is not None checks

Sameerlite · 2026-06-17T03:57:43Z

@greptileai

When stream_idle_timeout was set without ttft_timeout, Phase 1 (wait-for-first-token) was skipped and the idle loop wrapped the very first __anext__ with stream_idle_timeout. That measured time-to-first-token, not the inter-token gap, so any provider whose first token arrived slower than stream_idle_timeout was wrongly killed as 'stalled mid-stream', contradicting the documented 'between consecutive tokens' semantics. Run the first-token phase whenever either timeout is set, but bound it only by ttft_timeout's absolute deadline; when ttft_timeout is None the first-token wait is unbounded (still capped by the outer request timeout) and stream_idle_timeout governs only the gaps after content has started. Regression test: stream_idle_timeout-only with a first token slower than the idle window followed by prompt chunks must succeed; it fails on the pre-fix behavior.

TheCodeWrangler · 2026-06-17T12:34:18Z

Addressed the remaining finding from the last review: when stream_idle_timeout was set without ttft_timeout, the idle clock was wrapping the first token wait and could kill a slow-starting provider as a mid-stream stall. The first-token phase now runs whenever either timeout is set but is bounded only by ttft_timeout's deadline; with ttft_timeout unset the first-token wait is unbounded and stream_idle_timeout only governs gaps after content begins. Added a regression test for the stream_idle_timeout-only path that fails on the prior behavior (commit 22335b1)

@greptileai

… feat/router-ttft-timeout # Conflicts: # litellm/router.py

Use PEP 585/604 forms (list/dict, X | None) for the ttft_timeout/stream_idle_timeout annotations added by this PR so UP006/UP045 totals stay within the strict-rule budget ceiling the base enforces.

The promoted-stream path returns a reconstructed ModelResponse; assert it flows through the same content-policy check as the non-streaming path. A content_filter finish_reason with a content-policy fallback configured must raise ContentPolicyViolationError from _acompletion. Mutation-verified: fails if the _acompletion content-policy raise is removed.

TheCodeWrangler · 2026-06-17T13:01:16Z

Pushed three changes since the last review: merged the latest litellm_internal_staging to clear a conflict, modernized the new annotations to PEP 585/604 so the strict-rule budget gate passes, and added a test asserting the reconstructed response from a promoted stream goes through the same content-policy check as the non-streaming path. Re-requesting a review on the current head

Greptile is happy now @Sameerlite
#30337 (comment)

Sameerlite · 2026-06-18T03:53:31Z

Thanks for the PR! A couple of things to get this over the finish line:

Greptile's review scored 5/5 but there are still some open review threads — could you take a look and resolve them? Once they're all cleared we're good to go.

Once those are addressed we'll take another look — appreciate the contribution!

TheCodeWrangler · 2026-06-18T12:05:45Z

Thanks @Sameerlite. I have gone through the three open Greptile threads and resolved them. Two were already handled by the current revision: the Phase 2 collection no longer swallows mid-stream errors (it is a plain async for now, so disconnects and provider errors propagate to the cooldown/fallback path), and the timeout resolution uses explicit is None checks instead of or-chaining so an explicit 0.0 is honored. The third (timeout not enforced when the caller passes stream=True) is intentional and is what the title scopes this to; for streaming callers we hand back the raw stream to preserve streaming semantics, and they can enforce their own first-token deadline on the iterator they control. Details are in each thread

asyncio.wait_for already raises TimeoutError for a non-positive timeout, so the explicit remaining<=0 check duplicated wait_for's own behavior. Removing it keeps the same observable result (litellm.Timeout via the except clause) and lets wait_for return an already-ready chunk instead of spuriously timing out.

TheCodeWrangler · 2026-06-18T13:56:54Z

Quick note on the two red documentation checks: they are unrelated to this change. The test_router_settings.py gate does open("docs/my-website/docs/proxy/config_settings.md"), but that file was removed from this repo in the docs migration to litellm-docs (commit c35f3a5). It is now absent from both main and litellm_internal_staging, while the test that reads it is still present, so the check fails on any PR that adds a router setting; nothing in this PR can satisfy it from within the repo. I have the reference-table rows for the two new params staged in BerriAI/litellm-docs#353 for whenever the docs side is reconciled.

The actual blocker you flagged is cleared: all three Greptile threads are resolved (two were already addressed in code, the third is intentional and scoped to non-streaming, with reasoning in the thread). Greptile is at 5/5 and the rest of CI is green

Sameerlite · 2026-06-19T06:34:35Z

Thanks for addressing all the open review threads and for the detailed explanation on the CI failures! Triggering a fresh Greptile review on the latest commit:

@greptileai

Once Greptile confirms 5/5 on the current SHA, we'll take another look!

greptile-apps Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread litellm/router.py Outdated

Comment thread litellm/router.py Outdated

Comment thread litellm/router.py Outdated

Comment thread litellm/router.py Outdated

Comment thread litellm/router.py Outdated

veria-ai Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread litellm/router.py Outdated

TheCodeWrangler force-pushed the feat/router-ttft-timeout branch from c0075e3 to d368fe9 Compare June 13, 2026 02:26

veria-ai Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread README.md Outdated

TheCodeWrangler added 2 commits June 13, 2026 06:09

TheCodeWrangler force-pushed the feat/router-ttft-timeout branch from 29e6de0 to 999d442 Compare June 13, 2026 11:10

TheCodeWrangler added 3 commits June 13, 2026 07:35

test(router): add coverage for ttft_timeout edge cases

806db56

Add tests for the empty-stream path (StopAsyncIteration -> APIError) and the _acompletion intercept path that forces stream=True internally when ttft_timeout is set.

veria-ai Bot reviewed Jun 13, 2026

View reviewed changes

Comment thread litellm/router.py

TheCodeWrangler added 2 commits June 15, 2026 08:37

TheCodeWrangler mentioned this pull request Jun 15, 2026

docs(router_settings): document ttft_timeout and stream_idle_timeout BerriAI/litellm-docs#353

Open

TheCodeWrangler mentioned this pull request Jun 16, 2026

feat(router): observability for ttft_timeout / stream_idle_timeout (stacked on #30337) TheCodeWrangler/litellm#1

Draft

greptile-apps Bot reviewed Jun 17, 2026

View reviewed changes

Comment thread litellm/router.py

TheCodeWrangler added 3 commits June 17, 2026 07:46

Merge remote-tracking branch 'upstream/litellm_internal_staging' into…

1660bb6

… feat/router-ttft-timeout # Conflicts: # litellm/router.py

style(router): modernize new annotations to satisfy strict-rule budget

8795ac8

Use PEP 585/604 forms (list/dict, X | None) for the ttft_timeout/stream_idle_timeout annotations added by this PR so UP006/UP045 totals stay within the strict-rule budget ceiling the base enforces.

Uh oh!

Conversation

TheCodeWrangler commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Relevant issues

Pre-Submission checklist

Type

Changes

Why this matters

How it works

Per-deployment config (preferred for heterogeneous deployments)

Performance considerations

Concurrency, cleanup, and failover

Safe defaults

Files changed

Uh oh!

greptile-apps Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TheCodeWrangler commented Jun 13, 2026

Uh oh!

greptile-apps Bot commented Jun 13, 2026

Uh oh!

Uh oh!

veria-ai Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR overview

Security review

Uh oh!

CLAassistant commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TheCodeWrangler commented Jun 13, 2026

Uh oh!

Uh oh!

TheCodeWrangler commented Jun 13, 2026

Uh oh!

codecov Bot commented Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

TheCodeWrangler commented Jun 13, 2026

Uh oh!

Sameerlite commented Jun 16, 2026

Uh oh!

TheCodeWrangler commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sameerlite commented Jun 17, 2026

Uh oh!

Uh oh!

TheCodeWrangler commented Jun 17, 2026

Uh oh!

TheCodeWrangler commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Sameerlite commented Jun 18, 2026

Uh oh!

TheCodeWrangler commented Jun 18, 2026

Uh oh!

TheCodeWrangler commented Jun 18, 2026

Uh oh!

Sameerlite commented Jun 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

TheCodeWrangler commented Jun 13, 2026 •

edited

Loading

greptile-apps Bot commented Jun 13, 2026 •

edited

Loading

veria-ai Bot commented Jun 13, 2026 •

edited

Loading

CLAassistant commented Jun 13, 2026 •

edited

Loading

codecov Bot commented Jun 13, 2026 •

edited

Loading

TheCodeWrangler commented Jun 16, 2026 •

edited

Loading

TheCodeWrangler commented Jun 17, 2026 •

edited

Loading