|
About the Features category
|
|
0
|
77
|
March 20, 2025
|
|
Is there a hook/flag to capture activation statistics during inference for use with llm-compressor AWQ?
|
|
3
|
44
|
June 4, 2026
|
|
Dose vllm support Qwen3.5 pd disaggregation with Mooncake?
|
|
1
|
62
|
May 28, 2026
|
|
Why Does Decode Forward on PP Stage 0 Appear to Precede Prefill Forward on PP Stage 1 for the Same Request?
|
|
1
|
25
|
May 26, 2026
|
|
GPTQModel 能量化 GLM-5 FP16 到 INT8 吗
|
|
9
|
134
|
April 24, 2026
|
|
DeepSeek MTP full cuda graph support?
|
|
3
|
120
|
April 13, 2026
|
|
Qwen3.5-27B-FP8 Speculative Decoding
|
|
2
|
2053
|
April 11, 2026
|
|
thinking_token_budget silently ignored when passed via extra_args in vLLM 0.18.0
|
|
1
|
405
|
April 11, 2026
|
|
GLM 5 / Kimi k2.5 on 4 x RTX 6000 Pro
|
|
1
|
261
|
March 22, 2026
|
|
Compressed Multimodal embeddings inputs
|
|
1
|
60
|
March 18, 2026
|
|
NVFP4 Support In Attention
|
|
1
|
746
|
March 16, 2026
|
|
Distributed Speculative Decoding using Ray
|
|
3
|
149
|
February 11, 2026
|
|
Deployment example for a qwen3 model with hybrid thinking
|
|
10
|
2002
|
February 4, 2026
|
|
How do I precompute multimodal embeddings?
|
|
5
|
324
|
February 2, 2026
|
|
Implementing hidden state probes
|
|
1
|
78
|
January 30, 2026
|
|
Standalone draft model spec decode support in v0.x and v1
|
|
3
|
192
|
January 20, 2026
|
|
How to get kv cache value from vllm
|
|
5
|
330
|
January 19, 2026
|
|
Is there a plan for EVS to support Qwen3VL in response to the issue of sparse video tokens?
|
|
1
|
109
|
January 13, 2026
|
|
Exposing KV cache for recomposition / reuse beyond prefix caching?
|
|
1
|
164
|
January 13, 2026
|
|
Why I feel cuda-kernel marlin run not fast?
|
|
5
|
217
|
January 9, 2026
|
|
Can reasoning_effort parameter not ne used in vllm implementation via python?
|
|
1
|
396
|
January 2, 2026
|
|
Understanding vllm kv cache
|
|
5
|
1682
|
December 1, 2025
|
|
Has anyone successfully run DBO in a single node multi card environment?
|
|
1
|
118
|
December 1, 2025
|
|
EPLB behavior in elastic scaling
|
|
21
|
299
|
November 28, 2025
|
|
Qwen2.5 VL开启flashinfer失败
|
|
5
|
351
|
November 24, 2025
|
|
如何提升在单机多卡部署时的吞吐量
|
|
10
|
769
|
November 24, 2025
|
|
Does LMFE_STRICT_JSON_FIELD_ORDER not work?
|
|
2
|
86
|
November 19, 2025
|
|
Clarification Needed on Testing Elastic EP and Its Library Installation Dependencies
|
|
1
|
44
|
November 19, 2025
|
|
Custom modality
|
|
3
|
77
|
November 14, 2025
|
|
Asking 6-bit Quantization
|
|
1
|
210
|
November 11, 2025
|