|
Gguf pypi release version 0.18.0 (27 feb)
|
|
1
|
145
|
March 2, 2026
|
|
How to run GGUF with vLLM and ROCM
|
|
4
|
459
|
March 1, 2026
|
|
Following Qwen3.5 Usage Guide on H20 ,but can not host Qwen3.5-27B
|
|
4
|
383
|
February 28, 2026
|
|
BranchContext: CoW filesystem isolation for multi-sample vLLM workflows
|
|
1
|
26
|
February 27, 2026
|
|
How to serve two vLLM instance using docker?
|
|
3
|
490
|
February 26, 2026
|
|
Thinking Token limit setting
|
|
11
|
684
|
February 26, 2026
|
|
Hosting Qwen 3.5 35B-A3B model
|
|
1
|
1204
|
February 25, 2026
|
|
Vllm-openai DockerHub missing 0.16 tags
|
|
2
|
117
|
February 25, 2026
|
|
vLLM-Ascend 是否支持reasoning-parser
|
|
1
|
49
|
February 25, 2026
|
|
What is the role of the additional process running on GPU 0 in DP+EP?
|
|
3
|
49
|
February 25, 2026
|
|
True eager backend
|
|
6
|
121
|
February 24, 2026
|
|
Mistral Small 3.2 finetune errors out: There is no module or parameter named 'language_model' in LlamaForCausalLM
|
|
3
|
484
|
February 18, 2026
|
|
Is support diffusers.Pipeline's LoRA file?
|
|
15
|
77
|
February 16, 2026
|
|
Disaggregated Prefilling中什么是tail itl
|
|
6
|
26
|
February 16, 2026
|
|
Significant speedup observed with long common prefix between v0.11.0 and v0.12.0
|
|
9
|
103
|
February 13, 2026
|
|
Skip_leading_tokens 是什么意思
|
|
41
|
89
|
February 13, 2026
|
|
Priority in batch api
|
|
7
|
345
|
February 12, 2026
|
|
We're Live: OCI Deployment Guide for vLLM Production Stack
|
|
1
|
83
|
February 12, 2026
|
|
Native FP8 WMMA Support for AMD RDNA4 (RX 9070 XT / R9700) in vLLM
|
|
5
|
2116
|
February 12, 2026
|
|
Distributed Speculative Decoding using Ray
|
|
3
|
127
|
February 11, 2026
|
|
Is the vllm repository currently equipped with nightly testing
|
|
3
|
52
|
February 10, 2026
|
|
Qwen3-TTS Base模型 问题
|
|
1
|
70
|
February 9, 2026
|
|
Mistral-small-3.2: Unable to locate consolidated.safetensors.index.json
|
|
1
|
76
|
February 8, 2026
|
|
Pre-Built Docker Install
|
|
2
|
101
|
February 8, 2026
|
|
Does vLLM support shieldgemma?
|
|
1
|
40
|
February 6, 2026
|
|
vLLM v0.15.1 failing when deployed in AWS
|
|
3
|
812
|
February 6, 2026
|
|
三机部署glm-4.7(bf16)报错
|
|
3
|
75
|
February 6, 2026
|
|
推理时遇到的Memory占用问题
|
|
2
|
83
|
February 6, 2026
|
|
推理时报错RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
|
|
1
|
578
|
February 5, 2026
|
|
找不到内存广播块的问题(No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work)
|
|
1
|
1214
|
February 5, 2026
|