How to extend the context length up to 1,010,000 tokens on Qwen3.5?

dingfen · May 4, 2026, 2:56pm

In normal cases, Qwen3.5 models ( Qwen3.5-397B for example) have Context Length: 262,144 natively. But Qwen/Qwen3.5-397B-A17B · Hugging Face also says it can be extended to at most 1,010,000 tokens. And I am curious about how to do it by vLLM?

RunLLM · May 4, 2026, 2:56pm

To extend Qwen3.5’s context length beyond its native 262,144 tokens (e.g., up to 1,010,000 tokens), you must use RoPE scaling—specifically, the YaRN method—by overriding the model’s rope parameters via the --hf-overrides argument when launching vLLM. Set the VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 environment variable, and provide the desired rope_parameters and max_model_len in your serve command, for example:

export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
vllm serve Qwen/Qwen3.5-397B-A17B \
  --hf-overrides '{"text_config": {"rope_parameters": {"mrope_interleaved": true, "mrope_section": [11,11,10], "rope_type": "yarn", "rope_theta": 10000000, "partial_rotary_factor": 0.25, "factor": 4.0, "original_max_position_embeddings": 262144}}}' \
  --max-model-len 1010000

This will enable the model to process up to ~1M tokens per request using vLLM. For more details, see the official Qwen3.5 vLLM recipe.

Would you like a step-by-step explanation or more details on the parameters?

Sources:

Hint: Mention @RunLLM in the post for followups.

dingfen · May 4, 2026, 3:02pm

Good! But can you explain for me what the meanings of each key and value item in the -hf-overrides json ? @RunLLM

Topic		Replies	Views
能讓輸入上下文比最大上下文更長仍舊能工作嗎? General	2	124	December 3, 2025
Help needed with vLLM Qwen3-32B inference performance tuning (long input slow) General	1	1332	May 26, 2025
vLLM extremely slow / no response with max_model_len=8192 and multi-GPU tensor parallel KV-Cache	1	934	October 26, 2025
Deployment parameters for qwen3.5-4b? General	22	468	April 19, 2026
Max_tokens_per_doc support for rerank models General	1	55	January 21, 2026

How to extend the context length up to 1,010,000 tokens on Qwen3.5?

Related topics