|
Welcome to vLLM Forums! :wave:
|
|
1
|
1379
|
March 24, 2025
|
|
RDNA4 FP8 support
|
|
1
|
7
|
May 2, 2026
|
|
Getting flashinfer.jit: [Autotuner]: OOM detected
|
|
3
|
44
|
May 2, 2026
|
|
To understand max-num-seqs better!
|
|
1
|
29
|
April 30, 2026
|
|
Vllm-0.18.0 kv cache使用率从100%掉到0%
|
|
3
|
25
|
April 30, 2026
|
|
Has anyone successfully deploy deepseek-v4-flash on 8xL40s?
|
|
1
|
28
|
April 30, 2026
|
|
What is the correct chat template when serving gemma4?
|
|
1
|
20
|
April 30, 2026
|
|
Support for V100 (sm 70) on vllm 0.20
|
|
1
|
52
|
April 30, 2026
|
|
The latest version of vllm is not compatible with local deployment of deepseek-v4(0.20)
|
|
2
|
116
|
April 29, 2026
|
|
Why does a larger max_num_batched_tokens lead to less available KV cache memory
|
|
1
|
33
|
April 29, 2026
|
|
Install using --torch-backend=cu129 but try to import cu13
|
|
8
|
98
|
April 29, 2026
|
|
vLLM 两节点分布式部署下,/v1/completions 接口带 logprobs 参数会无限卡死 (Hang)
|
|
3
|
61
|
April 29, 2026
|
|
QPS doesn't scale with multi-card GPU
|
|
1
|
16
|
April 29, 2026
|
|
Throughput drops and increased TTFT when running background automation executors
|
|
1
|
32
|
April 28, 2026
|
|
Gibberish output from NVFP4 quantized Ministral on VLLM 0.19.2rc1.dev205+g07351e088
|
|
1
|
36
|
April 27, 2026
|
|
vLLM and vLLM omni difference
|
|
3
|
55
|
April 26, 2026
|
|
Why is TTFT worse for decode=1 than decode=100?
|
|
1
|
19
|
April 26, 2026
|
|
How to understand OOM and foresee memory usage
|
|
5
|
55
|
April 24, 2026
|
|
GPTQModel 能量化 GLM-5 FP16 到 INT8 吗
|
|
9
|
53
|
April 24, 2026
|
|
What is Included in -gpu-memory-utilization
|
|
4
|
73
|
April 24, 2026
|
|
EngineCore Error with NVIDIA-Nemotron-3-Super-120B-A12B-FP8 on 2*H100
|
|
2
|
41
|
April 21, 2026
|
|
OutOfMemoryError vLLM cant see the max memory available
|
|
1
|
57
|
April 21, 2026
|
|
Warning while serving Qwen/Qwen3.6-35B-A3B-FP8
|
|
7
|
448
|
April 21, 2026
|
|
Jetson Orin + vLLM Qwen3-0.6B quantized models – GPU active but no speedup, need optimization tips
|
|
1
|
28
|
April 20, 2026
|
|
Does the dynamic adapter in the sglang framework support the switching of different data types?
|
|
1
|
31
|
April 20, 2026
|
|
Minor Fix for Print Message Output
|
|
1
|
15
|
April 20, 2026
|
|
Does beam search support kv cache sharing between beams?
|
|
20
|
15
|
April 19, 2026
|
|
Config file not found Qwen/Qwen3.6-35B-A3B
|
|
1
|
144
|
April 19, 2026
|
|
Deployment parameters for qwen3.5-4b?
|
|
22
|
413
|
April 19, 2026
|
|
Far different performance between Qwen3-4B and Qwen3-Embedding-4B
|
|
0
|
81
|
April 17, 2026
|