|
Clarification: EP All-to-All Communication Across TP×DP — Diagram Validation
|
|
1
|
38
|
March 22, 2026
|
|
Vllm-ascend部署deepseekv3.2遇到一个问题。
|
|
2
|
41
|
March 22, 2026
|
|
How to expose v1/audio/transcriptions router for custom models
|
|
44
|
200
|
March 20, 2026
|
|
How to contribute to this repo ? where I can find test env?
|
|
1
|
27
|
March 19, 2026
|
|
How to get thinking content in qwen3.5 thinking
|
|
1
|
897
|
March 19, 2026
|
|
Do RTX 5090 and RTX PRO 5000 have some differencies which should be taken into account
|
|
7
|
147
|
March 19, 2026
|
|
关于ai迎合机制所带来的思考以及图片说明验证
|
|
1
|
20
|
March 19, 2026
|
|
我的显卡是8张L20,为什么运行glm5-FP8报错了
|
|
0
|
98
|
March 19, 2026
|
|
Compressed Multimodal embeddings inputs
|
|
1
|
50
|
March 18, 2026
|
|
LoRA integration for Qwen3.5-122b fails during deployment on vLLM 0.17.0
|
|
3
|
221
|
March 18, 2026
|
|
Whats is Mamba and Hybrid model?
|
|
2
|
223
|
March 17, 2026
|
|
Whats is the hybrid model
|
|
1
|
40
|
March 17, 2026
|
|
NVFP4 Support In Attention
|
|
1
|
595
|
March 16, 2026
|
|
On HX 370 ryzen iGPU
|
|
7
|
92
|
March 15, 2026
|
|
Trying to run Qwen3.5-397B-A17B-GPTQ-Int4
|
|
10
|
490
|
March 13, 2026
|
|
How to add a new sampler method into the current vllm code
|
|
1
|
15
|
March 13, 2026
|
|
Gpt-oss 20b not working
|
|
4
|
96
|
March 12, 2026
|
|
Measuring interactivity on vLLM
|
|
1
|
49
|
March 12, 2026
|
|
Qwen3.5-35b-a3b-fp8 显存越界
|
|
1
|
298
|
March 12, 2026
|
|
Contacting the vLLM Semantic Router team
|
|
4
|
56
|
March 11, 2026
|
|
500 internal server error when using webP
|
|
1
|
34
|
March 11, 2026
|
|
Given a completion text for a fixed prompt text, how to calculate the log_prob of the completion text
|
|
1
|
55
|
March 11, 2026
|
|
Suggestion to improve inferencing speed
|
|
17
|
677
|
March 11, 2026
|
|
Critique my vLLM configuration for qwen3-coder-next
|
|
3
|
196
|
March 10, 2026
|
|
Questions about performance tradeoffs between process-based and thread-based orchestration for multiple independent vllm serve instances
|
|
1
|
24
|
March 10, 2026
|
|
Need to serve a Qwen3 LLM with 235B params
|
|
9
|
357
|
March 7, 2026
|
|
60秒内没找到可用的内存广播块(No available shared memory broadcast block found in 60 seconds. This typically happens when some processes are hanging or doing some time-consuming work (e.g. compilation, weight/kv cache quantization).)
|
|
1
|
1065
|
March 7, 2026
|
|
Expert Parallelism All-to-All Communication without NVLink and DeepEP
|
|
3
|
309
|
March 3, 2026
|
|
torch.OutOfMemoryError: CUDA out of memory
|
|
15
|
1461
|
March 3, 2026
|
|
Vllm 0.16.0 version log changes
|
|
3
|
207
|
March 2, 2026
|