Model Support
DeepSeek Llama
Topic | Replies | Views | Activity | |
---|---|---|---|---|
About the Model Support category |
![]() |
0 | 52 | March 20, 2025 |
Progress bar to browser |
![]() |
0 | 7 | September 11, 2025 |
Intermittent Service Downtime Issue with Magistral-Small-2506 Model on GPU VM |
![]() ![]() |
1 | 22 | September 3, 2025 |
GPT OSS in docker container |
![]() ![]() |
1 | 110 | August 19, 2025 |
Why does prefill use normal attention, while decode uses weight absorption in MLA? |
![]() ![]() |
1 | 37 | August 5, 2025 |
Using vLLM on a HF model architecture modified locally |
![]() ![]() |
1 | 56 | July 7, 2025 |
The vllm/vllm-openai version 0.9.1 is nearly 30% faster compared to lmsysorg/sglang:v0.4.7.post, but it stops running every two to three hours |
![]() |
0 | 132 | June 23, 2025 |
Gemma 3 prefix caching in case of multimodal prompts |
![]() ![]() |
4 | 130 | May 22, 2025 |
Will vLLM follow-up DeepSeek's inference system |
![]() ![]() ![]() |
3 | 401 | May 13, 2025 |
Add Support for GLM-4 series model |
![]() ![]() |
1 | 105 | April 16, 2025 |
Why does phi3 implementation in vLLM inherit from llama? |
![]() ![]() |
1 | 25 | April 14, 2025 |
Does the latest version support deepseek-v3 tool call |
![]() |
0 | 83 | April 12, 2025 |