Description
I’m using vllm
to load a model with a Medusa heads. My current implementation uses the following setup:
from vllm import SamplingParams
from vllm import EngineArgs, LLMEngine
MODEL_NAME = "JackFram/llama-68m"
SPEC_MODEL = "abhigoyal/vllm-medusa-llama-68m-random"
llm = LLM(
model=MODEL_NAME,
max_model_len=1024,
speculative_config={
"method" : "medusa",
"model": SPEC_MODEL,
"num_speculative_tokens": 3,
},
tensor_parallel_size=1,
seed=0,
)
outputs = llm.generate(prompts=["Hi! How are you doing?", "Hi! How are you doing?"], use_tqdm=True)
Question
I want to know how to specify the Medusa choice tree for the model. Could you provide guidance or examples on how to do this?
**Environment: **
- Python version: [3.11]
- vllm version: [0.9.1]
- OS: [ubuntu 22.04]