|
Welcome to vLLM Forums! :wave:
|
|
3
|
1205
|
November 12, 2025
|
|
Expert Parallelism All-to-All Communication without NVLink and DeepEP
|
|
3
|
6
|
March 3, 2026
|
|
How to expose v1/audio/transcriptions router for custom models
|
|
34
|
37
|
March 3, 2026
|
|
torch.OutOfMemoryError: CUDA out of memory
|
|
15
|
18
|
March 3, 2026
|
|
Vllm 0.16.0 version log changes
|
|
3
|
27
|
March 2, 2026
|
|
Gguf pypi release version 0.18.0 (27 feb)
|
|
1
|
13
|
March 2, 2026
|
|
How to run GGUF with vLLM and ROCM
|
|
4
|
22
|
March 1, 2026
|
|
Vllm中,deepseek的模型 刷新kvcache的地方在哪
|
|
1
|
16
|
February 28, 2026
|
|
Following Qwen3.5 Usage Guide on H20 ,but can not host Qwen3.5-27B
|
|
4
|
73
|
February 28, 2026
|
|
BranchContext: CoW filesystem isolation for multi-sample vLLM workflows
|
|
1
|
7
|
February 27, 2026
|
|
How to serve two vLLM instance using docker?
|
|
3
|
21
|
February 26, 2026
|
|
Thinking Token limit setting
|
|
11
|
39
|
February 26, 2026
|
|
Hosting Qwen 3.5 35B-A3B model
|
|
1
|
342
|
February 25, 2026
|
|
Vllm-openai DockerHub missing 0.16 tags
|
|
2
|
35
|
February 25, 2026
|
|
vLLM-Ascend 是否支持reasoning-parser
|
|
1
|
14
|
February 25, 2026
|
|
What is the role of the additional process running on GPU 0 in DP+EP?
|
|
3
|
15
|
February 25, 2026
|
|
True eager backend
|
|
6
|
27
|
February 24, 2026
|
|
Mistral Small 3.2 finetune errors out: There is no module or parameter named 'language_model' in LlamaForCausalLM
|
|
3
|
380
|
February 18, 2026
|
|
Is support diffusers.Pipeline's LoRA file?
|
|
15
|
32
|
February 16, 2026
|
|
Disaggregated Prefilling中什么是tail itl
|
|
6
|
16
|
February 16, 2026
|
|
Significant speedup observed with long common prefix between v0.11.0 and v0.12.0
|
|
9
|
36
|
February 13, 2026
|
|
Skip_leading_tokens 是什么意思
|
|
41
|
53
|
February 13, 2026
|
|
Priority in batch api
|
|
7
|
64
|
February 12, 2026
|
|
We're Live: OCI Deployment Guide for vLLM Production Stack
|
|
2
|
32
|
February 12, 2026
|
|
Native FP8 WMMA Support for AMD RDNA4 (RX 9070 XT / R9700) in vLLM
|
|
5
|
754
|
February 12, 2026
|
|
Distributed Speculative Decoding using Ray
|
|
3
|
30
|
February 11, 2026
|
|
Is the vllm repository currently equipped with nightly testing
|
|
3
|
30
|
February 10, 2026
|
|
Qwen3-TTS Base模型 问题
|
|
1
|
32
|
February 9, 2026
|
|
Mistral-small-3.2: Unable to locate consolidated.safetensors.index.json
|
|
1
|
19
|
February 8, 2026
|
|
Pre-Built Docker Install
|
|
2
|
48
|
February 8, 2026
|