| About the Model Support category |   | 0 | 66 | March 20, 2025 | 
        
          | Disabling reasoning of Qwen3-VL-8B-Thinking per request |     | 1 | 29 | October 29, 2025 | 
        
          | Zerank - deploying using vllm |     | 3 | 8 | October 29, 2025 | 
        
          | Mistral Small 3.2 finetune errors out: There is no module or parameter named 'language_model' in LlamaForCausalLM |     | 2 | 12 | October 23, 2025 | 
        
          | Vllm-ascend是否支持deepseek-ocr |       | 2 | 133 | October 21, 2025 | 
        
          | Custom edit on the embedding out |     | 2 | 26 | October 15, 2025 | 
        
          | Llama 3.3 70B very slow |       | 4 | 68 | October 13, 2025 | 
        
          | Which ATTENTION BACKEND for gpt-oss in version 0.11.0? |     | 1 | 125 | October 4, 2025 | 
        
          | Loading non-consolidated Mistral safetensors |     | 3 | 39 | September 30, 2025 | 
        
          | Issue serving gemma3-27b-it |     | 1 | 127 | September 19, 2025 | 
        
          | Progress bar to browser |   | 0 | 15 | September 11, 2025 | 
        
          | Intermittent Service Downtime Issue with Magistral-Small-2506 Model on GPU VM |     | 1 | 71 | September 3, 2025 | 
        
          | GPT OSS in docker container |     | 1 | 176 | August 19, 2025 | 
        
          | Why does prefill use normal attention, while decode uses weight absorption in MLA? |     | 1 | 55 | August 5, 2025 | 
        
          | Using vLLM on a HF model architecture modified locally |     | 1 | 84 | July 7, 2025 | 
        
          | The vllm/vllm-openai version 0.9.1 is nearly 30% faster compared to lmsysorg/sglang:v0.4.7.post, but it stops running every two to three hours |   | 0 | 138 | June 23, 2025 | 
        
          | Gemma 3 prefix caching in case of multimodal prompts |     | 4 | 175 | May 22, 2025 | 
        
          | Will vLLM follow-up DeepSeek's inference system |       | 3 | 463 | May 13, 2025 | 
        
          | Add Support for GLM-4 series model |     | 1 | 110 | April 16, 2025 | 
        
          | Why does phi3 implementation in vLLM inherit from llama? |     | 1 | 31 | April 14, 2025 | 
        
          | Does the latest version support deepseek-v3 tool call |   | 0 | 93 | April 12, 2025 |