[Core] Reduce RoPE cache size for shorter context length#28136
[Core] Reduce RoPE cache size for shorter context length#28136labAxiaoming wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a helper function _get_runtime_model_config and updates get_rope to clamp max_position to the runtime context length under specific conditions. The review feedback points out two important issues: first, the precedence of scaling_type resolution should prioritize dual_chunk_attention_config over rope_scaling to prevent incorrect resolution; second, clamping max_position directly can cause subsequent lookups for original_max_position_embeddings to default to the clamped value instead of the original value, which could degrade model accuracy for scaling types like llama3.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Motivation
Some models define very large
max_position_embeddings, but users often servethem with a smaller
--context-length. In those cases,get_rope()still buildscos/sin caches for the full model config length, which increases model
initialization memory usage without benefiting the configured runtime context.
Changes
model_config.context_lenwhen it is smaller than
max_position.truncated and later expanded:
defaultllama3proportionalrope_scaling is NoneValidation
Ran a remote config smoke test locally:
test_sgalng_remote_rope_configs.py
RoPE cache memory summary
Summary:
3210.75 MiB -> 80.00 MiB3130.75 MiB (97.5%)PASSAlso verified that cache expansion via
_ensure_cos_sin_cache_length()matchesdirect full-cache construction for
default,llama3, andproportional.Checklist
Review and Merge Process
/tag-and-rerun-ci,/tag-run-ci-label,/rerun-failed-ciCI States
Latest PR Test (Base): ⏳ Run #27461850431
Latest PR Test (Extra): ⏳ Run #27461850376