About the Features category
|
|
0
|
14
|
March 20, 2025
|
vLLM cannot connect to existing Ray cluster
|
|
7
|
2
|
May 3, 2025
|
Support for (sparse) key value caching
|
|
16
|
15
|
May 3, 2025
|
How to use speculative decoding?
|
|
3
|
11
|
May 1, 2025
|
Grammar CPU bound performance
|
|
9
|
99
|
April 29, 2025
|
Does vLLM support multiple model_executor?
|
|
1
|
7
|
April 28, 2025
|
Spec decode with eagle get very low Draft acceptance rate
|
|
1
|
34
|
April 25, 2025
|
LoRA Adapter enabling with vLLM is not working
|
|
4
|
51
|
April 21, 2025
|
Goodput Guided Speculative Decoding
|
|
2
|
68
|
April 19, 2025
|
Is structured output compatible with automatic prefix caching?
|
|
1
|
33
|
April 14, 2025
|
Tool calling using Offline Inference?
|
|
1
|
17
|
April 14, 2025
|
How to crop kv_caches?
|
|
0
|
20
|
April 13, 2025
|
Minimum requirements for Disaggregated Prefilling?
|
|
0
|
46
|
April 9, 2025
|
Is there any roadmap to support prefix caching on dram and disk?
|
|
0
|
23
|
April 8, 2025
|
Can Lora adapters be loaded on different GPUs
|
|
1
|
25
|
April 7, 2025
|
Do we have regression tests for structured output? Especially speed regression?
|
|
0
|
17
|
March 31, 2025
|
Why remove bonus token of requset in draft model?
|
|
0
|
24
|
March 30, 2025
|
Why is the prefix cache hit rate constantly increasing
|
|
3
|
66
|
March 27, 2025
|
Multiple tools with Mistral Large 2411
|
|
4
|
62
|
March 26, 2025
|
Pipeline Parallelism Support - Source Code Location
|
|
1
|
51
|
March 25, 2025
|
GGUF quantized models Inference support
|
|
0
|
80
|
March 25, 2025
|
V1 Chunked Prefill Scheduling Policy: how prefill would be scheduled?
|
|
8
|
129
|
March 25, 2025
|
Prefix Cache control
|
|
1
|
72
|
March 24, 2025
|
Avoiding hash collisions in prefix cache
|
|
7
|
88
|
March 24, 2025
|