vLLM Forums

Topic	Replies	Views	Activity
Questions on piecewise torch compile design torch.compile integration	38	207	June 18, 2025
Vllm on new H20-3e Nvidia has occasional abnormal character like "📐", using Qwen 2.5 VL 72B General	3	9	June 18, 2025
How to pass vLLM specific parameters via openai API from clients General	2	19	June 18, 2025
Maximum batch size with Pipeline Parallelism General	3	24	June 17, 2025
Prompt_embeds usage in vllm openai completion api Multi-modality	4	24	June 17, 2025
W8a8两种量化方式有详细介绍吗 Quantization	1	29	June 15, 2025
V1 has lower end-to-end performance than V0(--num-scheduler-steps=8) V1 Feedback	6	63	June 12, 2025
Vllm triton相关的配置有哪些？ General	3	26	June 12, 2025
Vllm什么情况会使用triton General	6	29	June 12, 2025
VLLM Engine Metrics Benchmarking	20	31	June 11, 2025
V1 has lower end-to-end performance than V0(--num-scheduler-steps=8) General	1	8	June 11, 2025
Promblem about the PagedAttention.split_kv_cache implimentation General	3	48	June 11, 2025
vllm的V1为什么删除了multi step特性 General	3	84	June 11, 2025
Can’t use ampere features Hardware Support	1	10	June 10, 2025
Seqence Parallelism Support - Source Code Location Features	0	8	June 10, 2025
流式响应中 usage 字段始终为 None，无法获取 Token 使用量 General	0	14	June 10, 2025
How LoRA (bf16) is fused into a GPTQ 8bit model? General	9	21	June 10, 2025
Is there a newly example to show how to add a new LLM into vLLM? General	3	16	June 10, 2025
Docker image `vllm/vllm-openai:v0.9.0` doesn't work on 5090 General	3	106	June 10, 2025
`computeCapability not supported` error with LoRA adapter on 5090 General	1	24	June 10, 2025
Something weired about the reading procedure of q_vecs in page attention kernel KV-Cache	3	9	June 9, 2025
Is there any performance comparison between `at::sum_out` and `vllm::moe::moe_sum_kernel`? General	1	11	June 6, 2025
Maximum Beam Width Limitations in vLLM Beam Search General	0	19	June 6, 2025
Setting two LLMs on different GPUs in one offline inference script General	1	19	June 6, 2025
vLLM benchmark host with self-signed certificate Benchmarking	1	16	June 4, 2025
Add ArgumentParser to FlexibleArgumentParser conversion General	1	22	June 3, 2025
KV Cache quantizing? NVIDIA GPU Support	3	49	June 2, 2025
Change management strategy for preventing post-OS-auto-update vllm FTS in containers General	7	39	June 2, 2025
Why position 0 is not needed by MTP for speculative decode General	3	12	June 2, 2025
Computation time remain consistent across chunks in chunked-prefill despite linearly growing attention complexity? KV-Cache	1	12	June 2, 2025