Welcome to vLLM Forums! :wave:
|
|
1
|
689
|
March 24, 2025
|
Running vllm bench serve from CPU-only node
|
|
3
|
91
|
August 29, 2025
|
Num request running stays on 1
|
|
3
|
2
|
August 29, 2025
|
Clarification: Does vLLM support concurrent decoding with multiple LoRA adapters in online inference?
|
|
1
|
5
|
August 29, 2025
|
Disable kv cache storage in vllm
|
|
1
|
18
|
August 29, 2025
|
Is --enable-expert-parallel supported for gpt-oss models on b200/gb200?
|
|
1
|
12
|
August 28, 2025
|
CUDA error: no kernel image is available for execution on the device
|
|
1
|
15
|
August 28, 2025
|
Embeddings with vLLM in Kubernetes
|
|
5
|
10
|
August 28, 2025
|
How to improve tokenization speed for embeddings generation?
|
|
1
|
13
|
August 28, 2025
|
GLM4.5 V memory leak on inference
|
|
9
|
17
|
August 28, 2025
|
INFO: 127.0.0.1:47190 - "POST /generate HTTP/1.1" 500 Internal Server Error
|
|
1
|
7
|
August 28, 2025
|
How to do inference of BGE-m3 embedding with vllm
|
|
1
|
15
|
August 27, 2025
|
Vllm处理长输入性能变差讨论
|
|
7
|
20
|
August 27, 2025
|
2 vllm containers on a single GPU
|
|
3
|
461
|
August 27, 2025
|
Vllm启动时,日志卡在nccl相关部分,不继续往下
|
|
15
|
72
|
August 27, 2025
|
安装vllm遇到问题,torch相关自动下载xpu版本
|
|
1
|
17
|
August 27, 2025
|
Deployment example for a qwen3 model with hybrid thinking
|
|
7
|
35
|
August 26, 2025
|
How to switch the pooling method of pooling models
|
|
17
|
22
|
August 26, 2025
|
Qwen 2.5 VL for videos
|
|
1
|
29
|
August 26, 2025
|
Ray cluster DeepSeek-R1-Distill-Qwen-32B-AWQ
|
|
43
|
118
|
August 25, 2025
|
DeepSeek-V3 tool_choice="auto", not working but tool_choice="required" is working
|
|
3
|
140
|
August 25, 2025
|
How to deploy the CosyVoice2.0 model using vllm?
|
|
3
|
6
|
August 27, 2025
|
GPT-oss inference
|
|
1
|
79
|
August 23, 2025
|
How to support shape-dependent branches with support_torch_compile (PIECEWISE)?
|
|
9
|
11
|
August 22, 2025
|
Is the prompt parameter in the OpenAI Transcription API supported by vLLM?
|
|
1
|
6
|
August 22, 2025
|
Can we work with Wan2.2 model with vllm?
|
|
3
|
21
|
August 21, 2025
|
A question about request handling
|
|
5
|
31
|
August 21, 2025
|
General questions on structured output backend
|
|
5
|
32
|
August 20, 2025
|
The OpenAI endpoint doesn't support function strict setting
|
|
17
|
79
|
August 20, 2025
|
Why is inference for Qwen 2.5 VL so slow when we send an image?
|
|
5
|
119
|
August 20, 2025
|