Question: Specifying Medusa Choice Tree in vllm

Description
I’m using vllm to load a model with a Medusa heads. My current implementation uses the following setup:

from vllm import SamplingParams
from vllm import EngineArgs, LLMEngine

MODEL_NAME = "JackFram/llama-68m"
SPEC_MODEL = "abhigoyal/vllm-medusa-llama-68m-random"

llm = LLM(
    model=MODEL_NAME,
    max_model_len=1024,
    speculative_config={
        "method" : "medusa",
        "model": SPEC_MODEL,
        "num_speculative_tokens": 3,
    },
    tensor_parallel_size=1,
    seed=0,
)
outputs = llm.generate(prompts=["Hi! How are you doing?", "Hi! How are you doing?"], use_tqdm=True)

Question
I want to know how to specify the Medusa choice tree for the model. Could you provide guidance or examples on how to do this?

**Environment: **

  • Python version: [3.11]
  • vllm version: [0.9.1]
  • OS: [ubuntu 22.04]

It seems likely that vLLM’s Medusa implementation currently only supports top-1 proposals and does not expose an option to specify or customize the Medusa choice tree (tree-style speculation) in the public API. The Medusa draft model class and related documentation state that only top-1 token proposals are supported, and there is no mention of a user-facing parameter for a choice tree or tree mask. This is also reflected in the MedusaWorker API, which does not document any tree configuration.

Would you like more detail on Medusa’s current capabilities or on how tree-style speculation might be supported in the future?

Sources: