Does the vLLM v1 support Speculative Decoding now?

RookieChenTaoYu · March 22, 2025, 11:50am

In the “vllm/v1/core/scheduler.py” at the Scheduler.update_from_output() method, there is annotation says that “# NOTE: once we support N tokens per step (spec decode), the outer lists can be of length > 1.”. Does this comment mean that vllm v1 not support Speculative Decoding yet?

LiuXiaoxuanPKU · March 23, 2025, 5:25am

v1 supports ngram speculative decoding. You can try as this example shows: vllm/tests/v1/e2e/test_ngram_spec_decode.py at main · vllm-project/vllm · GitHub

RookieChenTaoYu · March 23, 2025, 6:38am

Thanks for your reply. I also think vllm v1 can surpport the spec decode, so does this annotation mean that the logprobs feature is not available in vllm v1 with spec decode now, but may implement in future?

LiuXiaoxuanPKU · March 24, 2025, 5:28am

right, currently spec decode does not support logprobs as shown here: vllm/vllm/v1/spec_decode/utils.py at d20e261199d3e4bffe9ee0e495ba7f7454229f11 · vllm-project/vllm · GitHub.

RookieChenTaoYu · March 25, 2025, 8:04am

Thanks very much for your explanation.

Topic		Replies	Views
How to use speculative decoding? Speculative Decoding	3	163	May 1, 2025
Goodput Guided Speculative Decoding Speculative Decoding	2	129	April 19, 2025
Improving Speculative Decoding for Beginning Tokens & Structured Output Speculative Decoding	1	23	July 16, 2025
How does the forward pass in speculative decoding work? General	1	24	June 29, 2025
Does vllm support draft model use tp>1 when I use speculative decoding Speculative Decoding	1	10	July 29, 2025

Does the vLLM v1 support Speculative Decoding now?

Related topics