Hi everyone, I searched SpeculativeConfig and found no information about setting the CompileConfig for the draft model. I need to modify the CompileConfig for the draft model. How can I do that?
There is currently no documented or supported way to set or modify the CompilationConfig (CompileConfig) specifically for the draft model in vLLM’s SpeculativeConfig. The available configuration options for the draft model in SpeculativeConfig include quantization, max_model_len, revision, and code_revision, but not compilation or compile config parameters. This limitation is also noted as a TODO in the codebase, indicating that user control over draft model compilation is not yet implemented or exposed in the public API or CLI options. See discussion in this issue and SpeculativeConfig docs.
Would you like more detail on the current internal handling or possible workarounds?
Sources:
Hint: Mention @RunLLM in the post for followups.