Why is chunked prefill always enabled in V1?

vbnogueira · September 18, 2025, 1:32pm

Hi,

I was looking at the V1 code and noticed that chunked prefill is always enabled (even if the user explicitly unsets it in the command line). I was able to track the source of this to the following PR [1]. I understand the logic of the PR to not raise the error whenever chunked prefill is set, however I don’t quite follow why chunked prefill should always be enabled in V1.

[1] [V1][Bugfix] Always set enable_chunked_prefill = True for V1 by WoosukKwon · Pull Request #11061 · vllm-project/vllm · GitHub

RunLLM · September 18, 2025, 1:33pm

In vLLM V1, chunked prefill is always enabled by design and cannot be disabled, even if the user tries to unset it via command-line flags. This is because the V1 scheduler architecture is built around chunked prefill as a core mechanism for batching and scheduling, which improves throughput and latency by allowing large prefills to be split and interleaved with decode requests. The relevant PR you referenced (#11061) enforces this by always setting enable_chunked_prefill = True for V1, and later PRs and documentation confirm that disabling chunked prefill in V1 is not supported and will raise an error if attempted.

This design choice is documented in the official vLLM optimization guides, which state: “In vLLM V1, chunked prefill is always enabled by default. This is different from vLLM V0, where it was conditionally enabled based on model characteristics” (Optimization and Tuning). The V1 scheduler does not distinguish between prefill and decode stages, and chunked prefill is considered a first-class, non-optional feature (discussion). Attempts to disable it in V1 will result in an explicit error (PR #21645).

Would you like more detail on the technical reasons or scheduler design?

Sources:

vbnogueira · September 18, 2025, 1:49pm

I see, so the V1 scheduler assumes chunked prefill will always yield better results for the relevant setups and makes it mandatory for performance reasons.
It also seems that bypassing this would violate scheduler assertions and possibly lead to bugs.
Got it, thanks.

RunLLM · September 18, 2025, 1:49pm

Correct—V1’s scheduler is fundamentally designed around chunked prefill, and disabling it would break core scheduling logic, likely causing assertion failures or undefined behavior (PR #11061, Optimization and Tuning). The architecture assumes chunked prefill for both correctness and optimal performance, so bypassing it is unsupported and unsafe.

Let me know if you want a deeper dive into the technical details or scheduler internals.

Sources:

PR #11061: Always set enable_chunked_prefill = True for V1

Topic		Replies	Views
Should vLLM consider prefix caching when chunked prefill is enabled? General	1	283	June 24, 2025
Vllm中，使能了chunk prefill, 那它的chunk size是在哪里设的，他和 long_prefill_token_threshold参数有关系吗 General	1	90	November 23, 2025
V1 Chunked Prefill Scheduling Policy: how prefill would be scheduled? Scheduling	8	481	March 25, 2025
Vllm: error: unrecognized arguments: --no-enable-chunked-prefill General	1	499	July 26, 2025
一个长输入的请求，切chunk ，比如切了4份，prefill的时候，这四个可以同时做prefill 吗，还是有依赖关系的 General	15	55	January 21, 2026

Why is chunked prefill always enabled in V1?

Related topics