Pp8并行，update_from_output 会等所有rank的 model_executor.execute_model 执行完了之后才会执行吗

RunLLM · January 8, 2026, 9:19am

这段代码的逻辑是：如果 req_id 不在 persistent batch（即 req_index 为 None），说明该请求要么是被抢占后恢复的，要么上一步未被调度，本轮需要重新加入 persistent batch，所以将其加入 req_ids_to_add，后续会补充进 batch。官方API文档和源码注释均有说明。

需要详细说明 persistent batch 的管理机制和请求状态流转吗？

Sources:

https://docs.vllm.ai/en/latest/api/vllm/v1/worker/tpu_model_runner/#__code_106

Hint: Mention @RunLLM in the post for followups.

yangshanjun · January 8, 2026, 9:21am

@RunLLM 多并发情况下，pp 并行，chunk prefill ，会不会有问题

RunLLM · January 8, 2026, 9:22am

多并发、pipeline parallel（PP）和chunk prefill可以同时使用，但在高并发和长上下文场景下，PP+chunk prefill 可能会出现“流水线气泡”（pipeline bubble）和执行负载不均衡，导致部分GPU空闲、部分GPU繁忙，影响整体吞吐和延迟。官方issue分析指出，固定chunk大小时，随着KV历史增长，后续chunk执行变慢，易产生气泡。为此，vLLM正在推进动态chunk调度（DCPP）以缓解此问题。

需要详细说明如何优化PP+chunk prefill的调度或具体的负载均衡策略吗？

Sources:

Hint: Mention @RunLLM in the post for followups.

yangshanjun · January 8, 2026, 11:37am

@RunLLM

    \# NOTE(woosuk): The persistent batch optimization assumes that

    \# consecutive batches contain mostly the same requests. If batches

    \# have low request overlap (e.g., alternating between two distinct

    \# sets of requests), this optimization becomes very inefficient.

    for req_id in unscheduled_req_ids:

        self.input_batch.remove_request(req_id)

这个注释是什么含义

RunLLM · January 8, 2026, 11:38am

这段注释的含义是：persistent batch 优化假设连续的 batch 之间大部分请求是重叠的。如果每轮 batch 的请求集合变化很大（比如两组请求交替出现），则 persistent batch 需要频繁移除和添加请求，导致优化效果变差，甚至变得低效。官方API文档有原文说明。

需要详细解释 persistent batch 的原理和低重叠场景下的性能影响吗？

Sources:

https://docs.vllm.ai/en/latest/api/vllm/v1/worker/gpu_model_runner/#vllm.v1.worker.gpu_model_runner.GPUModelRunner._update_states

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
什么情况下，一个请求会被重复schedule? General	58	113	January 6, 2026
VLLM V1 Scheduler: Inconsistent Request Scheduling Under Token Budget Limit General	25	279	December 17, 2025
Skip_leading_tokens 是什么意思 General	41	59	February 13, 2026
Why is it so slow to build a odeVLLM from source using Docker? General	39	93	January 17, 2026
一个长输入的请求，切chunk ，比如切了4份，prefill的时候，这四个可以同时做prefill 吗，还是有依赖关系的 General	15	81	January 21, 2026

Pp8并行，update_from_output 会等所有rank的 model_executor.execute_model 执行完了之后才会执行吗

Related topics