Questions on piecewise torch compile design
|
|
38
|
207
|
June 18, 2025
|
Vllm on new H20-3e Nvidia has occasional abnormal character like "📐", using Qwen 2.5 VL 72B
|
|
3
|
9
|
June 18, 2025
|
How to pass vLLM specific parameters via openai API from clients
|
|
2
|
19
|
June 18, 2025
|
Maximum batch size with Pipeline Parallelism
|
|
3
|
24
|
June 17, 2025
|
Prompt_embeds usage in vllm openai completion api
|
|
4
|
24
|
June 17, 2025
|
W8a8两种量化方式有详细介绍吗
|
|
1
|
29
|
June 15, 2025
|
V1 has lower end-to-end performance than V0(--num-scheduler-steps=8)
|
|
6
|
63
|
June 12, 2025
|
Vllm triton相关的配置有哪些?
|
|
3
|
26
|
June 12, 2025
|
Vllm什么情况会使用triton
|
|
6
|
29
|
June 12, 2025
|
VLLM Engine Metrics
|
|
20
|
31
|
June 11, 2025
|
V1 has lower end-to-end performance than V0(--num-scheduler-steps=8)
|
|
1
|
8
|
June 11, 2025
|
Promblem about the PagedAttention.split_kv_cache implimentation
|
|
3
|
48
|
June 11, 2025
|
vllm的V1为什么删除了multi step特性
|
|
3
|
84
|
June 11, 2025
|
Can’t use ampere features
|
|
1
|
10
|
June 10, 2025
|
Seqence Parallelism Support - Source Code Location
|
|
0
|
8
|
June 10, 2025
|
流式响应中 usage 字段始终为 None,无法获取 Token 使用量
|
|
0
|
14
|
June 10, 2025
|
How LoRA (bf16) is fused into a GPTQ 8bit model?
|
|
9
|
21
|
June 10, 2025
|
Is there a newly example to show how to add a new LLM into vLLM?
|
|
3
|
16
|
June 10, 2025
|
Docker image `vllm/vllm-openai:v0.9.0` doesn't work on 5090
|
|
3
|
106
|
June 10, 2025
|
`computeCapability not supported` error with LoRA adapter on 5090
|
|
1
|
24
|
June 10, 2025
|
Something weired about the reading procedure of q_vecs in page attention kernel
|
|
3
|
9
|
June 9, 2025
|
Is there any performance comparison between `at::sum_out` and `vllm::moe::moe_sum_kernel`?
|
|
1
|
11
|
June 6, 2025
|
Maximum Beam Width Limitations in vLLM Beam Search
|
|
0
|
19
|
June 6, 2025
|
Setting two LLMs on different GPUs in one offline inference script
|
|
1
|
19
|
June 6, 2025
|
vLLM benchmark host with self-signed certificate
|
|
1
|
16
|
June 4, 2025
|
Add ArgumentParser to FlexibleArgumentParser conversion
|
|
1
|
22
|
June 3, 2025
|
KV Cache quantizing?
|
|
3
|
49
|
June 2, 2025
|
Change management strategy for preventing post-OS-auto-update vllm FTS in containers
|
|
7
|
39
|
June 2, 2025
|
Why position 0 is not needed by MTP for speculative decode
|
|
3
|
12
|
June 2, 2025
|
Computation time remain consistent across chunks in chunked-prefill despite linearly growing attention complexity?
|
|
1
|
12
|
June 2, 2025
|