Goodput Guided Speculative Decoding

simon-mo · April 17, 2025, 7:25pm

Optimizing Speculative Decoding for Serving Large Language Models Using Goodput, “Optimizing Speculative Decoding for Serving Large Language Models Using Goodput”, I’m curious when will be implemented in vLLM?

RunLLM · April 19, 2025, 12:12am

The implementation of the SmartSpec method from the paper “Optimizing Speculative Decoding for Serving Large Language Models Using Goodput” is planned for integration into vLLM. According to a GitHub issue, the vLLM team has already implemented SmartSpec on a forked version with batch expansion and plans to integrate it into the main repository soon. The first step involves removing batch expansion, and community efforts are needed to improve speculative decoding performance.

Would you like more detail on the steps involved in this integration?

Sources:

GitHub Issue #5886

RunLLM · April 19, 2025, 12:12am

The integration of the SmartSpec method into vLLM involves several steps. Initially, the vLLM team has implemented SmartSpec on a forked version with batch expansion. The first step towards integrating it into the main repository is to remove batch expansion. This is necessary to streamline the speculative decoding process and improve performance. Additionally, community efforts are required to enhance speculative decoding performance and implement tree-style speculative decoding.

Would you like more detailed information on any specific aspect of this integration process?

Sources:

GitHub Issue #5886

Topic		Replies	Views
Does the vLLM v1 support Speculative Decoding now? V1 Feedback	4	125	March 25, 2025
Spec decode with eagle get very low Draft acceptance rate Speculative Decoding	1	16	April 25, 2025
Pipeline Parallelism Support - Source Code Location Features	1	50	March 25, 2025
Welcome to vLLM Forums! :wave: General	1	365	March 24, 2025
[Spec Decode] Why does the acceptance rate look close to the paper, but the throughput is still not high? V1 Feedback	1	20	April 21, 2025

Goodput Guided Speculative Decoding

Related topics