Welcome to vLLM Forums! :wave:
|
|
1
|
508
|
March 24, 2025
|
How does the forward pass in speculative decoding work?
|
|
1
|
1
|
June 29, 2025
|
Speeding up vllm inference for Qwen2.5-VL
|
|
23
|
887
|
June 27, 2025
|
Multimodal inference guideline?
|
|
41
|
128
|
June 27, 2025
|
Dose vllm V1 support asynchronous scheduling?
|
|
3
|
100
|
June 27, 2025
|
Ascend-vllm中怎么指定batch和seqlen来测试性能
|
|
4
|
23
|
June 27, 2025
|
Using guided decoding for JSON
|
|
1
|
11
|
June 26, 2025
|
How to deploy vllm-ascend in AutoDL's 910B instance?
|
|
5
|
16
|
June 26, 2025
|
Scheduler in vllm
|
|
1
|
9
|
June 26, 2025
|
Setting up vLLM in an airgapped environment
|
|
3
|
17
|
June 25, 2025
|
Vllm throughput less on 7B in comparison to 32B
|
|
1
|
8
|
June 25, 2025
|
How does VLLM handle jsons for guided prompting
|
|
9
|
15
|
June 25, 2025
|
In single node deployment environment, how can we make vllm call unified_attention more often to trigger KVCache connector workload
|
|
12
|
9
|
June 24, 2025
|
对于vllm-ascend能力的咨询
|
|
3
|
14
|
June 24, 2025
|
在v1架构中,为什么将enginecore拆封成独立的进程
|
|
1
|
22
|
June 24, 2025
|
Should vLLM consider prefix caching when chunked prefill is enabled?
|
|
1
|
13
|
June 24, 2025
|
High-Throughput kernel on single-node
|
|
1
|
7
|
June 23, 2025
|
The vllm/vllm-openai version 0.9.1 is nearly 30% faster compared to lmsysorg/sglang:v0.4.7.post, but it stops running every two to three hours
|
|
0
|
46
|
June 23, 2025
|
What v means in vllm?
|
|
3
|
30
|
June 23, 2025
|
Proper settings for running qwen2.5 72b on 48gb using awq
|
|
1
|
26
|
June 21, 2025
|
Some question abound Data Parallel examples
|
|
7
|
21
|
June 21, 2025
|
Gemma 3 Quantization
|
|
5
|
64
|
June 21, 2025
|
Vllm v1 在哪个地方判读结束符或者停止输出的?
|
|
1
|
13
|
June 21, 2025
|
Build VLLM without installing gcc?
|
|
1
|
39
|
June 20, 2025
|
使用以下2种方式,获得的结果有很大差异
|
|
44
|
60
|
June 20, 2025
|
Free AMD GPU Access for the vllm developer
|
|
2
|
41
|
June 20, 2025
|
V1 Engine child process dies unnoticed; check_health() is a no-op
|
|
5
|
8
|
June 19, 2025
|
How to keep chat history in VLLM?
|
|
1
|
18
|
June 19, 2025
|
How to obtain the logits of LLM
|
|
21
|
54
|
June 19, 2025
|
Can the AsyncEngine be compatible with the backend of external_launcher?
|
|
1
|
11
|
June 18, 2025
|