About the Features category
|
|
0
|
21
|
March 20, 2025
|
Multi-node K8s GPU pooling
|
|
3
|
10
|
July 17, 2025
|
Error trying to handle streaming tool call
|
|
3
|
6
|
July 17, 2025
|
Improving Speculative Decoding for Beginning Tokens & Structured Output
|
|
1
|
8
|
July 16, 2025
|
Question: Specifying Medusa Choice Tree in vllm
|
|
1
|
8
|
July 11, 2025
|
Disagg Prefill timeout
|
|
1
|
23
|
July 7, 2025
|
DeepSeek-V3 tool_choice="auto", not working but tool_choice="required" is working
|
|
1
|
36
|
July 4, 2025
|
MoE quantization
|
|
9
|
277
|
July 2, 2025
|
Why is cuda graph capture sizes limited by max_num_seqs
|
|
1
|
56
|
June 29, 2025
|
Scheduler in vllm
|
|
1
|
37
|
June 26, 2025
|
Prompt_embeds usage in vllm openai completion api
|
|
4
|
28
|
June 17, 2025
|
W8a8两种量化方式有详细介绍吗
|
|
1
|
37
|
June 15, 2025
|
Seqence Parallelism Support - Source Code Location
|
|
0
|
18
|
June 10, 2025
|
Something weired about the reading procedure of q_vecs in page attention kernel
|
|
3
|
10
|
June 9, 2025
|
Computation time remain consistent across chunks in chunked-prefill despite linearly growing attention complexity?
|
|
1
|
12
|
June 2, 2025
|
Why does computation time remain consistent across chunks in chunked-prefill despite linearly growing attention complexity?
|
|
3
|
14
|
June 2, 2025
|
APC Slowdown with block-size=1
|
|
1
|
32
|
May 26, 2025
|
RunBot's math-to-text on NVIDIA NeMo Framework AutoModel
|
|
11
|
42
|
May 19, 2025
|
Issue with DynamicYaRN and Key-Value Cache Reuse in vLLM
|
|
1
|
38
|
May 18, 2025
|
VUA - library code for LLM inference engines for external storage of KV caches
|
|
1
|
23
|
May 13, 2025
|
Specifying special tokens
|
|
5
|
145
|
May 8, 2025
|
vLLM cannot connect to existing Ray cluster
|
|
16
|
254
|
May 8, 2025
|
Support for (sparse) key value caching
|
|
16
|
75
|
May 3, 2025
|
How to use speculative decoding?
|
|
3
|
129
|
May 1, 2025
|
Grammar CPU bound performance
|
|
9
|
172
|
April 29, 2025
|
Does vLLM support multiple model_executor?
|
|
1
|
87
|
April 28, 2025
|
Spec decode with eagle get very low Draft acceptance rate
|
|
1
|
94
|
April 25, 2025
|
LoRA Adapter enabling with vLLM is not working
|
|
4
|
135
|
April 21, 2025
|
Goodput Guided Speculative Decoding
|
|
2
|
120
|
April 19, 2025
|
Is structured output compatible with automatic prefix caching?
|
|
1
|
51
|
April 14, 2025
|