Topic	Replies	Views	Activity
About the Model Support category Model Support	0	66	March 20, 2025
Disabling reasoning of Qwen3-VL-8B-Thinking per request Model Support	1	29	October 29, 2025
Zerank - deploying using vllm Model Support	3	8	October 29, 2025
Mistral Small 3.2 finetune errors out: There is no module or parameter named 'language_model' in LlamaForCausalLM Model Support	2	12	October 23, 2025
Vllm-ascend是否支持deepseek-ocr DeepSeek	2	133	October 21, 2025
Custom edit on the embedding out Model Support	2	26	October 15, 2025
Llama 3.3 70B very slow Llama	4	68	October 13, 2025
Which ATTENTION BACKEND for gpt-oss in version 0.11.0? Model Support	1	125	October 4, 2025
Loading non-consolidated Mistral safetensors Model Support	3	39	September 30, 2025
Issue serving gemma3-27b-it Model Support	1	127	September 19, 2025
Progress bar to browser Model Support	0	15	September 11, 2025
Intermittent Service Downtime Issue with Magistral-Small-2506 Model on GPU VM Model Support	1	71	September 3, 2025
GPT OSS in docker container Model Support	1	176	August 19, 2025
Why does prefill use normal attention, while decode uses weight absorption in MLA? DeepSeek	1	55	August 5, 2025
Using vLLM on a HF model architecture modified locally Model Support	1	84	July 7, 2025
The vllm/vllm-openai version 0.9.1 is nearly 30% faster compared to lmsysorg/sglang:v0.4.7.post, but it stops running every two to three hours DeepSeek	0	138	June 23, 2025
Gemma 3 prefix caching in case of multimodal prompts Model Support	4	175	May 22, 2025
Will vLLM follow-up DeepSeek's inference system DeepSeek	3	463	May 13, 2025
Add Support for GLM-4 series model Model Support	1	110	April 16, 2025
Why does phi3 implementation in vLLM inherit from llama? Model Support	1	31	April 14, 2025
Does the latest version support deepseek-v3 tool call Model Support	0	93	April 12, 2025

Model Support