|
Welcome to vLLM Forums! :wave:
|
|
1
|
1328
|
March 24, 2025
|
|
About the General category
|
|
0
|
86
|
March 17, 2025
|
|
What is Included in -gpu-memory-utilization
|
|
1
|
14
|
April 17, 2026
|
|
Best practice for synchronizing KVCache offloading across ranks (asking in context of tp)?
|
|
5
|
23
|
April 13, 2026
|
|
How does CUDA graph memory scale in vLLM
|
|
1
|
17
|
March 12, 2026
|
|
How to to set sample params default for all requests like "truncate_prompt_tokens" to vllm embedding
|
|
1
|
16
|
April 12, 2026
|
|
High Network Latency (500ms) When Calling vLLM Gemma-27B from India to Atlanta Server – Any Optimization Options?
|
|
1
|
15
|
April 11, 2026
|
|
vLLM throughput dropping when running concurrent background executors?
|
|
1
|
20
|
April 11, 2026
|
|
TurboQuant: KV Cache Compression
|
|
2
|
2941
|
April 11, 2026
|
|
vLLM on 4 nodes fails randomly
|
|
1
|
50
|
April 6, 2026
|
|
Qwen3.5 only output reasoning and no content
|
|
1
|
100
|
April 3, 2026
|
|
How to serve gemma-4-31b-it
|
|
2
|
522
|
April 2, 2026
|
|
vLLM 两节点分布式部署下,/v1/completions 接口带 logprobs 参数会无限卡死 (Hang)
|
|
2
|
34
|
March 30, 2026
|
|
Qwen3.5-27b-fp8没有think
|
|
3
|
119
|
March 30, 2026
|
|
Vllm中,deepseek的模型 刷新kvcache的地方在哪
|
|
44
|
115
|
March 30, 2026
|
|
Deployment parameters for qwen3.5-4b?
|
|
21
|
263
|
March 30, 2026
|
|
Why does data parallel use both GPUs?
|
|
3
|
35
|
March 27, 2026
|
|
vLLM 的子进程会继承main进程的环境变量吗?
|
|
3
|
28
|
March 25, 2026
|
|
Vllm-ascend运行deepseekv3.2遇到问题
|
|
1
|
23
|
March 24, 2026
|
|
[gpt_oss_triton_kernels_moe.py:59] Using legacy triton_kernels on ROCm
|
|
1
|
43
|
March 24, 2026
|
|
Vllm-ascend运行deepseekv3.2时出现bug!
|
|
1
|
6
|
March 23, 2026
|
|
Vllm-ascend处理多并发时遇到问题
|
|
1
|
20
|
March 23, 2026
|
|
Vllm-ascend部署deepseekv3.2遇到一个问题。
|
|
2
|
23
|
March 22, 2026
|
|
How to expose v1/audio/transcriptions router for custom models
|
|
44
|
113
|
March 20, 2026
|
|
How to contribute to this repo ? where I can find test env?
|
|
1
|
19
|
March 19, 2026
|
|
How to get thinking content in qwen3.5 thinking
|
|
1
|
528
|
March 19, 2026
|
|
Do RTX 5090 and RTX PRO 5000 have some differencies which should be taken into account
|
|
7
|
62
|
March 19, 2026
|
|
关于ai迎合机制所带来的思考以及图片说明验证
|
|
1
|
16
|
March 19, 2026
|
|
我的显卡是8张L20,为什么运行glm5-FP8报错了
|
|
0
|
58
|
March 19, 2026
|
|
LoRA integration for Qwen3.5-122b fails during deployment on vLLM 0.17.0
|
|
3
|
127
|
March 18, 2026
|