Does beam search support kv cache sharing between beams?

RunLLM · April 19, 2026, 1:34pm

在 vLLM 中，best_of 参数表示每个 prompt 生成多少个候选序列，并从中选出概率最高的 n 个返回。若 use_beam_search=True 且 best_of=4，则会用 beam search 生成 4 个候选序列，最终返回这 4 个分数最高的结果 (来源)。

注意，beam search 模式下 best_of 必须大于等于 n，否则会报错 (来源)。需要详细解释参数交互或代码示例吗？

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Maximum Beam Width Limitations in vLLM Beam Search General	0	88	June 6, 2025
VLLM 0.10.1 benchmark do not free memory Benchmarking	13	207	November 10, 2025
Support for (sparse) key value caching KV-Cache	16	601	May 3, 2025
Questions about PagedAttention v1 General	7	95	July 14, 2025
Vllm中，deepseek的模型刷新kvcache的地方在哪 General	44	162	March 30, 2026