|
Welcome to vLLM Forums! :wave:
|
|
1
|
1424
|
March 24, 2025
|
|
About the General category
|
|
0
|
91
|
March 17, 2025
|
|
Vllm在h800运行模型推理输出的思考模式字段和内容
|
|
1
|
9
|
May 14, 2026
|
|
How to use Gemma 4 with the new MTP drafters?
|
|
8
|
1524
|
May 14, 2026
|
|
vLLM 多节点的PD分离部署
|
|
2
|
36
|
May 14, 2026
|
|
Understanding Multi Node Parallelization
|
|
7
|
27
|
May 13, 2026
|
|
AI工具使用本地vLLM部署的DeepSeek-V4 的DSML兼容性问题如何解决?
|
|
1
|
81
|
May 11, 2026
|
|
What is the recommended way to support dynamic pruning for speculative decoding draft trees?
|
|
1
|
30
|
May 8, 2026
|
|
How does vllm process multimodal embedding requests
|
|
8
|
51
|
May 7, 2026
|
|
LLM memory caching
|
|
7
|
60
|
May 7, 2026
|
|
vLLM 0.20.1, Radeon AI 9700, 1 CPU core at 100%
|
|
5
|
56
|
May 6, 2026
|
|
QPS doesn't scale with multi-card GPU
|
|
3
|
36
|
May 6, 2026
|
|
vLLM Tensor Parallel Workers Not Completing Initialization
|
|
5
|
1319
|
May 4, 2026
|
|
How is vLLM handling internal queue requests?
|
|
3
|
51
|
May 4, 2026
|
|
Is the reason for my vllm 0.20.0 failing to start because of nixl?
|
|
7
|
126
|
May 3, 2026
|
|
RDNA4 FP8 support
|
|
1
|
66
|
May 2, 2026
|
|
Getting flashinfer.jit: [Autotuner]: OOM detected
|
|
3
|
56
|
May 2, 2026
|
|
To understand max-num-seqs better!
|
|
1
|
132
|
April 30, 2026
|
|
Vllm-0.18.0 kv cache使用率从100%掉到0%
|
|
3
|
50
|
April 30, 2026
|
|
Has anyone successfully deploy deepseek-v4-flash on 8xL40s?
|
|
1
|
161
|
April 30, 2026
|
|
What is the correct chat template when serving gemma4?
|
|
1
|
119
|
April 30, 2026
|
|
Support for V100 (sm 70) on vllm 0.20
|
|
1
|
254
|
April 30, 2026
|
|
Why does a larger max_num_batched_tokens lead to less available KV cache memory
|
|
1
|
79
|
April 29, 2026
|
|
Install using --torch-backend=cu129 but try to import cu13
|
|
8
|
547
|
April 29, 2026
|
|
vLLM 两节点分布式部署下,/v1/completions 接口带 logprobs 参数会无限卡死 (Hang)
|
|
3
|
69
|
April 29, 2026
|
|
Throughput drops and increased TTFT when running background automation executors
|
|
1
|
50
|
April 28, 2026
|
|
Gibberish output from NVFP4 quantized Ministral on VLLM 0.19.2rc1.dev205+g07351e088
|
|
1
|
49
|
April 27, 2026
|
|
vLLM and vLLM omni difference
|
|
3
|
95
|
April 26, 2026
|
|
Why is TTFT worse for decode=1 than decode=100?
|
|
1
|
23
|
April 26, 2026
|
|
How to understand OOM and foresee memory usage
|
|
5
|
88
|
April 24, 2026
|