vLLM Forums

Topic	Replies	Views	Activity
Welcome to vLLM Forums! :wave: General	1	508	March 24, 2025
How does the forward pass in speculative decoding work? General	1	1	June 29, 2025
Speeding up vllm inference for Qwen2.5-VL General	23	887	June 27, 2025
Multimodal inference guideline? General	41	128	June 27, 2025
Dose vllm V1 support asynchronous scheduling? V1 Feedback	3	100	June 27, 2025
Ascend-vllm中怎么指定batch和seqlen来测试性能 General	4	23	June 27, 2025
Using guided decoding for JSON General	1	11	June 26, 2025
How to deploy vllm-ascend in AutoDL's 910B instance? Ascend Support	5	16	June 26, 2025
Scheduler in vllm Features	1	9	June 26, 2025
Setting up vLLM in an airgapped environment General	3	17	June 25, 2025
Vllm throughput less on 7B in comparison to 32B General	1	8	June 25, 2025
How does VLLM handle jsons for guided prompting General	9	15	June 25, 2025
In single node deployment environment, how can we make vllm call unified_attention more often to trigger KVCache connector workload General	12	9	June 24, 2025
对于vllm-ascend能力的咨询 General	3	14	June 24, 2025
在v1架构中，为什么将enginecore拆封成独立的进程 General	1	22	June 24, 2025
Should vLLM consider prefix caching when chunked prefill is enabled? General	1	13	June 24, 2025
High-Throughput kernel on single-node Benchmarking	1	7	June 23, 2025
The vllm/vllm-openai version 0.9.1 is nearly 30% faster compared to lmsysorg/sglang:v0.4.7.post, but it stops running every two to three hours DeepSeek	0	46	June 23, 2025
What v means in vllm？ General	3	30	June 23, 2025
Proper settings for running qwen2.5 72b on 48gb using awq General	1	26	June 21, 2025
Some question abound Data Parallel examples General	7	21	June 21, 2025
Gemma 3 Quantization General	5	64	June 21, 2025
Vllm v1 在哪个地方判读结束符或者停止输出的？ General	1	13	June 21, 2025
Build VLLM without installing gcc? General	1	39	June 20, 2025
使用以下2种方式，获得的结果有很大差异 Benchmarking	44	60	June 20, 2025
Free AMD GPU Access for the vllm developer General	2	41	June 20, 2025
V1 Engine child process dies unnoticed; check_health() is a no-op General	5	8	June 19, 2025
How to keep chat history in VLLM？ General	1	18	June 19, 2025
How to obtain the logits of LLM General	21	54	June 19, 2025
Can the AsyncEngine be compatible with the backend of external_launcher? RL Integration	1	11	June 18, 2025