|
Issues with Voxtral models and omni
|
|
3
|
130
|
April 14, 2026
|
|
Support for MiniMax-M2.5
|
|
1
|
100
|
April 14, 2026
|
|
Best practice for synchronizing KVCache offloading across ranks (asking in context of tp)?
|
|
5
|
50
|
April 13, 2026
|
|
How does CUDA graph memory scale in vLLM
|
|
1
|
91
|
March 12, 2026
|
|
DeepSeek MTP full cuda graph support?
|
|
3
|
93
|
April 13, 2026
|
|
How to to set sample params default for all requests like "truncate_prompt_tokens" to vllm embedding
|
|
1
|
48
|
April 12, 2026
|
|
High Network Latency (500ms) When Calling vLLM Gemma-27B from India to Atlanta Server – Any Optimization Options?
|
|
1
|
26
|
April 11, 2026
|
|
vLLM throughput dropping when running concurrent background executors?
|
|
1
|
102
|
April 11, 2026
|
|
vLLM hangs during worker initialization on Blackwell PCIe GPUs unless --disable-custom-all-reduce is used
|
|
1
|
373
|
April 11, 2026
|
|
On 8-card Ascend 910B with vLLM serving Qwen3.5-122B-A10B, the client freezes at 8% progress when running accuracy test, as the server stops receiving new requests after Running reqs and KV Cache fall to 0.
|
|
1
|
114
|
April 11, 2026
|
|
# SM120 (RTX PRO 6000) NVFP4 MoE Performance Report -- Qwen3.5-397B
|
|
1
|
741
|
April 11, 2026
|
|
Qwen3.5-27B-FP8 Speculative Decoding
|
|
2
|
1958
|
April 11, 2026
|
|
thinking_token_budget silently ignored when passed via extra_args in vLLM 0.18.0
|
|
1
|
280
|
April 11, 2026
|
|
TurboQuant: KV Cache Compression
|
|
2
|
4565
|
April 11, 2026
|
|
Vllm启动时,日志卡在nccl相关部分,不继续往下
|
|
16
|
1523
|
April 8, 2026
|
|
vLLM on 4 nodes fails randomly
|
|
1
|
83
|
April 6, 2026
|
|
SM120 (RTX PRO 4000): 6.5x throughput gain and v0.18.1 regression findings
|
|
1
|
536
|
April 3, 2026
|
|
Qwen3.5 only output reasoning and no content
|
|
1
|
239
|
April 3, 2026
|
|
How to serve gemma-4-31b-it
|
|
2
|
834
|
April 2, 2026
|
|
Mixed GPU support?
|
|
1
|
341
|
March 31, 2026
|
|
Qwen3.5-27b-fp8没有think
|
|
3
|
302
|
March 30, 2026
|
|
Vllm中,deepseek的模型 刷新kvcache的地方在哪
|
|
44
|
179
|
March 30, 2026
|
|
Why does data parallel use both GPUs?
|
|
3
|
57
|
March 27, 2026
|
|
Any project supported plan for minicpm-o-4.5?
|
|
1
|
64
|
March 26, 2026
|
|
vLLM 的子进程会继承main进程的环境变量吗?
|
|
3
|
35
|
March 25, 2026
|
|
Vllm-ascend运行deepseekv3.2遇到问题
|
|
1
|
45
|
March 24, 2026
|
|
[gpt_oss_triton_kernels_moe.py:59] Using legacy triton_kernels on ROCm
|
|
1
|
68
|
March 24, 2026
|
|
Vllm-ascend运行deepseekv3.2时出现bug!
|
|
1
|
31
|
March 23, 2026
|
|
Vllm-ascend处理多并发时遇到问题
|
|
1
|
49
|
March 23, 2026
|
|
GLM 5 / Kimi k2.5 on 4 x RTX 6000 Pro
|
|
1
|
213
|
March 22, 2026
|