|
About the Model Support category
|
|
0
|
138
|
March 20, 2025
|
|
Issues with Voxtral models and omni
|
|
3
|
39
|
April 14, 2026
|
|
Support for MiniMax-M2.5
|
|
1
|
33
|
April 14, 2026
|
|
On 8-card Ascend 910B with vLLM serving Qwen3.5-122B-A10B, the client freezes at 8% progress when running accuracy test, as the server stops receiving new requests after Running reqs and KV Cache fall to 0.
|
|
1
|
27
|
April 11, 2026
|
|
Any project supported plan for minicpm-o-4.5?
|
|
1
|
36
|
March 26, 2026
|
|
[Field Report] AWQ on RTX 5060 Ti (SM_120 / Blackwell) — awq_marlin + TRITON_ATTN working
|
|
1
|
132
|
March 18, 2026
|
|
Trying to run Qwen3.5-397B-A17B-GPTQ-Int4
|
|
10
|
280
|
March 13, 2026
|
|
Suggestion to improve inferencing speed
|
|
17
|
452
|
March 11, 2026
|
|
Critique my vLLM configuration for qwen3-coder-next
|
|
3
|
126
|
March 10, 2026
|
|
Hosting Qwen 3.5 35B-A3B model
|
|
1
|
1040
|
February 25, 2026
|
|
Mistral Small 3.2 finetune errors out: There is no module or parameter named 'language_model' in LlamaForCausalLM
|
|
3
|
448
|
February 18, 2026
|
|
Mistral-small-3.2: Unable to locate consolidated.safetensors.index.json
|
|
1
|
50
|
February 8, 2026
|
|
How to run Deep Seek OCR 2 in vllm
|
|
1
|
1106
|
January 27, 2026
|
|
Vllm-omni cannot load z-image-turbo
|
|
3
|
291
|
December 27, 2025
|
|
Llama 3.3 70B very slow
|
|
5
|
652
|
December 11, 2025
|
|
Text to speech support with /v1/audio/speech route
|
|
1
|
546
|
November 28, 2025
|
|
Using InternVL3 to perform OCR tasks yields worse results in vLLM than in LMDeploy
|
|
2
|
72
|
November 27, 2025
|
|
Serving minimax-m2
|
|
3
|
400
|
November 8, 2025
|
|
Disabling reasoning of Qwen3-VL-8B-Thinking per request
|
|
1
|
3045
|
October 29, 2025
|
|
Zerank - deploying using vllm
|
|
3
|
211
|
October 29, 2025
|
|
Vllm-ascend是否支持deepseek-ocr
|
|
2
|
324
|
October 21, 2025
|
|
Custom edit on the embedding out
|
|
2
|
58
|
October 15, 2025
|
|
Which ATTENTION BACKEND for gpt-oss in version 0.11.0?
|
|
1
|
488
|
October 4, 2025
|
|
Loading non-consolidated Mistral safetensors
|
|
3
|
392
|
September 30, 2025
|
|
Issue serving gemma3-27b-it
|
|
1
|
518
|
September 19, 2025
|
|
Progress bar to browser
|
|
0
|
44
|
September 11, 2025
|
|
Intermittent Service Downtime Issue with Magistral-Small-2506 Model on GPU VM
|
|
1
|
240
|
September 3, 2025
|
|
GPT OSS in docker container
|
|
1
|
320
|
August 19, 2025
|
|
Why does prefill use normal attention, while decode uses weight absorption in MLA?
|
|
1
|
193
|
August 5, 2025
|
|
Using vLLM on a HF model architecture modified locally
|
|
1
|
194
|
July 7, 2025
|