The current vllm has a autoregressive sampler right ?
How can to add a new sampler that goes with the generate method
Yes, vLLM currently uses an autoregressive sampler as part of its generation pipeline. To add a new sampler that integrates with the generate method, you need to implement a new Sampler class (subclassing torch.nn.Module) and ensure it follows the interface and processing steps described in the vLLM documentation. The sampler should process logits and sampling metadata, then output the next tokens accordingly. See the Sampler API reference and design docs for details on the required methods and integration points.
You will need to update the model’s sampling logic to use your new sampler, typically by modifying the relevant model runner or engine code to instantiate and call your custom sampler during generation. Would you like a step-by-step guide or code example for this process?
Sources:
Hint: Mention @RunLLM in the post for followups.