|
About the Features category
|
|
0
|
58
|
March 20, 2025
|
|
Qwen3.5-27B-FP8 Speculative Decoding
|
|
2
|
80
|
March 12, 2026
|
|
Distributed Speculative Decoding using Ray
|
|
3
|
45
|
February 11, 2026
|
|
Deployment example for a qwen3 model with hybrid thinking
|
|
10
|
1222
|
February 4, 2026
|
|
How do I precompute multimodal embeddings?
|
|
5
|
66
|
February 2, 2026
|
|
Implementing hidden state probes
|
|
1
|
28
|
January 30, 2026
|
|
Standalone draft model spec decode support in v0.x and v1
|
|
3
|
91
|
January 20, 2026
|
|
How to get kv cache value from vllm
|
|
5
|
179
|
January 19, 2026
|
|
Is there a plan for EVS to support Qwen3VL in response to the issue of sparse video tokens?
|
|
1
|
59
|
January 13, 2026
|
|
Exposing KV cache for recomposition / reuse beyond prefix caching?
|
|
1
|
40
|
January 13, 2026
|
|
Why I feel cuda-kernel marlin run not fast?
|
|
5
|
77
|
January 9, 2026
|
|
Can reasoning_effort parameter not ne used in vllm implementation via python?
|
|
1
|
143
|
January 2, 2026
|
|
Understanding vllm kv cache
|
|
5
|
510
|
December 1, 2025
|
|
Has anyone successfully run DBO in a single node multi card environment?
|
|
1
|
63
|
December 1, 2025
|
|
EPLB behavior in elastic scaling
|
|
21
|
117
|
November 28, 2025
|
|
Qwen2.5 VL开启flashinfer失败
|
|
5
|
222
|
November 24, 2025
|
|
如何提升在单机多卡部署时的吞吐量
|
|
10
|
361
|
November 24, 2025
|
|
Does LMFE_STRICT_JSON_FIELD_ORDER not work?
|
|
4
|
62
|
November 19, 2025
|
|
Clarification Needed on Testing Elastic EP and Its Library Installation Dependencies
|
|
1
|
36
|
November 19, 2025
|
|
Custom modality
|
|
3
|
31
|
November 14, 2025
|
|
Asking 6-bit Quantization
|
|
1
|
86
|
November 11, 2025
|
|
Expert offloading
|
|
1
|
379
|
November 11, 2025
|
|
Raw tokens completion via online serving
|
|
1
|
76
|
November 3, 2025
|
|
vLLM extremely slow / no response with max_model_len=8192 and multi-GPU tensor parallel
|
|
1
|
576
|
October 26, 2025
|
|
When using large batches, the Ray service crashes.ray.exceptions.RayChannelTimeoutError: System error: Timed out waiting for object available to read
|
|
41
|
1243
|
October 26, 2025
|
|
Why vllm does not support LMP?
|
|
3
|
111
|
October 23, 2025
|
|
Is it possible to configure the order of the pipeline in multi-node deployments?
|
|
3
|
93
|
October 16, 2025
|
|
Question on Advanced vLLM Use Case: Distributed Prefix Caching for a CAG Evaluation Framework
|
|
1
|
90
|
October 15, 2025
|
|
A bit of frustration with Quantization
|
|
5
|
573
|
October 14, 2025
|
|
DeepSeek-V3 tool_choice="auto", not working but tool_choice="required" is working
|
|
4
|
645
|
October 13, 2025
|