|
Support for (sparse) key value caching
|
|
16
|
465
|
May 3, 2025
|
|
How to use speculative decoding?
|
|
3
|
756
|
May 1, 2025
|
|
Grammar CPU bound performance
|
|
9
|
475
|
April 29, 2025
|
|
Does vLLM support multiple model_executor?
|
|
1
|
312
|
April 28, 2025
|
|
Spec decode with eagle get very low Draft acceptance rate
|
|
1
|
319
|
April 25, 2025
|
|
LoRA Adapter enabling with vLLM is not working
|
|
4
|
485
|
April 21, 2025
|
|
Goodput Guided Speculative Decoding
|
|
2
|
217
|
April 19, 2025
|
|
Is structured output compatible with automatic prefix caching?
|
|
1
|
113
|
April 14, 2025
|
|
Tool calling using Offline Inference?
|
|
1
|
137
|
April 14, 2025
|
|
How to crop kv_caches?
|
|
0
|
66
|
April 13, 2025
|
|
Minimum requirements for Disaggregated Prefilling?
|
|
0
|
92
|
April 9, 2025
|
|
Can Lora adapters be loaded on different GPUs
|
|
1
|
100
|
April 7, 2025
|
|
Do we have regression tests for structured output? Especially speed regression?
|
|
0
|
42
|
March 31, 2025
|
|
Why remove bonus token of requset in draft model?
|
|
0
|
54
|
March 30, 2025
|
|
Why is the prefix cache hit rate constantly increasing
|
|
3
|
960
|
March 27, 2025
|
|
Multiple tools with Mistral Large 2411
|
|
4
|
289
|
March 26, 2025
|
|
Pipeline Parallelism Support - Source Code Location
|
|
1
|
132
|
March 25, 2025
|
|
GGUF quantized models Inference support
|
|
0
|
262
|
March 25, 2025
|
|
V1 Chunked Prefill Scheduling Policy: how prefill would be scheduled?
|
|
8
|
493
|
March 25, 2025
|
|
Prefix Cache control
|
|
1
|
373
|
March 24, 2025
|
|
Avoiding hash collisions in prefix cache
|
|
7
|
274
|
March 24, 2025
|