Topic	Replies	Views	Activity
Support for (sparse) key value caching KV-Cache	16	465	May 3, 2025
How to use speculative decoding? Speculative Decoding	3	756	May 1, 2025
Grammar CPU bound performance Structured Outputs	9	475	April 29, 2025
Does vLLM support multiple model_executor? Scheduling	1	312	April 28, 2025
Spec decode with eagle get very low Draft acceptance rate Speculative Decoding	1	319	April 25, 2025
LoRA Adapter enabling with vLLM is not working LoRA	4	485	April 21, 2025
Goodput Guided Speculative Decoding Speculative Decoding	2	217	April 19, 2025
Is structured output compatible with automatic prefix caching? Structured Outputs	1	113	April 14, 2025
Tool calling using Offline Inference? Tool Calling	1	137	April 14, 2025
How to crop kv_caches? KV-Cache	0	66	April 13, 2025
Minimum requirements for Disaggregated Prefilling? Disaggregated Prefilling	0	92	April 9, 2025
Can Lora adapters be loaded on different GPUs LoRA	1	100	April 7, 2025
Do we have regression tests for structured output? Especially speed regression? Structured Outputs	0	42	March 31, 2025
Why remove bonus token of requset in draft model? Speculative Decoding	0	54	March 30, 2025
Why is the prefix cache hit rate constantly increasing KV-Cache	3	960	March 27, 2025
Multiple tools with Mistral Large 2411 Tool Calling	4	289	March 26, 2025
Pipeline Parallelism Support - Source Code Location Features	1	132	March 25, 2025
GGUF quantized models Inference support Quantization	0	262	March 25, 2025
V1 Chunked Prefill Scheduling Policy: how prefill would be scheduled? Scheduling	8	493	March 25, 2025
Prefix Cache control KV-Cache	1	373	March 24, 2025
Avoiding hash collisions in prefix cache KV-Cache	7	274	March 24, 2025

Features