In the “vllm/v1/core/scheduler.py” at the Scheduler.update_from_output() method, there is annotation says that “# NOTE: once we support N tokens per step (spec decode), the outer lists can be of length > 1.”. Does this comment mean that vllm v1 not support Speculative Decoding yet?
v1 supports ngram speculative decoding. You can try as this example shows: vllm/tests/v1/e2e/test_ngram_spec_decode.py at main · vllm-project/vllm · GitHub
Thanks for your reply. I also think vllm v1
can surpport the spec decode, so does this annotation mean that the logprobs
feature is not available in vllm v1
with spec decode
now, but may implement in future?
right, currently spec decode does not support logprobs as shown here: vllm/vllm/v1/spec_decode/utils.py at d20e261199d3e4bffe9ee0e495ba7f7454229f11 · vllm-project/vllm · GitHub.
Thanks very much for your explanation.