Why remove bonus token of requset in draft model?

weishengying1 · March 30, 2025, 3:27pm

vllm/spec_decode/multi_step_worker.py

main


      
          sampler output, one per model forward pass, along with indicator of
          whether torch tensor in sampler output need to be transposed in latter
          sampler_output_to_torch logic.
          
          For multi step worker, this indicator shall be True.
          """
          self._raise_if_unsupported(execute_model_req)
          # Expand the batch for sequences with a bonus token.
          # Perform a forward pass on the expanded batch and filter the
          # response to retain only the original sequences' responses.
          expanded_request, indices_of_seq_with_bonus_tokens =\
              self._expand_execute_model_request(
                  execute_model_req, seq_ids_with_bonus_token_in_last_step)
          
          # Run model sample_len times.
          model_outputs: List[SamplerOutput] = []
          if current_platform.is_cuda_alike() and isinstance(
                  self.model_runner, TP1DraftModelRunner
          ) and self.model_runner.supports_gpu_multi_step(expanded_request):
              # Here we run the draft_model_runner with multi-step prepare
              # on the GPU directly

github.com/vllm-project/vllm

vllm/spec_decode/multi_step_worker.py

main


      
          
          @staticmethod
          def _expand_execute_model_request(
              execute_model_req: ExecuteModelRequest,
              seq_with_bonus_token_in_last_step: set,
          ) -> Tuple[ExecuteModelRequest, List[int]]:
              """
              Expands the execute model request based on sequences with bonus
              tokens.
          
              For each sequence with a bonus token, this method creates a new
              sequence without the bonus token and adds it to the execute model
              request. The original sequence groups are also retained. The indices
              of the original sequence groups are returned for further processing.
          
              Args:
                  execute_model_req (ExecuteModelRequest): The original execute
                  model request.
                  seq_with_bonus_token_in_last_step (set): Set of sequence IDs that 
                  contain bonus tokens.

Topic		Replies	Views
Specifying special tokens Structured Outputs	5	96	May 8, 2025
Irrelevant Responses with Unsloth Fine-tuned Llama 3.1 8B using vLLM General	3	71	April 10, 2025
How to obtain the logits of LLM General	21	57	June 19, 2025
DP并行时在set_forward_context中做all_reduce作用是什么 General	17	67	May 30, 2025
Why k=1 can achieve inference acceleration for DeepSeek MTP spec decode General	1	15	May 31, 2025

Why remove bonus token of requset in draft model?

Related topics