|
Vllm-ascend是否支持async推理?
|
|
2
|
125
|
October 15, 2025
|
|
昇腾920b是否支持通义千问2.5-vl
|
|
2
|
137
|
October 2, 2025
|
|
RTX Pro 6000 Tensor Parallelism CUBLAS_STATUS_ALLOC_FAILED
|
|
3
|
459
|
September 13, 2025
|
|
Is there any plan to organize the cuda-only configuration
|
|
1
|
48
|
August 15, 2025
|
|
Unable to use vLLM 0.10.1-gptoss on GH200 (aarch64) — source for custom wheel not available?
|
|
3
|
570
|
August 15, 2025
|
|
vLLM Benchmarking: Why Is GPUDirect RDMA Not Outperforming Standard RDMA in a Pipeline-Parallel Setup?
|
|
1
|
614
|
August 14, 2025
|
|
Why does quickReduce not need to use system-scope release write operations to update flags?
|
|
0
|
33
|
August 13, 2025
|
|
Can vLLM built for old GPU (GT 630M) ? It may use CUDA 9.1.85
|
|
1
|
265
|
August 4, 2025
|
|
How to deploy vllm-ascend in AutoDL's 910B instance?
|
|
7
|
474
|
August 2, 2025
|
|
GPU Time Slicing
|
|
0
|
211
|
July 16, 2025
|
|
How to modify the cuda graph capture sizes via vllm plugin
|
|
1
|
427
|
July 1, 2025
|
|
Can’t use ampere features
|
|
1
|
228
|
June 10, 2025
|
|
KV Cache quantizing?
|
|
3
|
1134
|
June 2, 2025
|
|
Does vllm support inference or service startup of CPU small model?
|
|
3
|
252
|
May 30, 2025
|
|
Struggling with my dual GPU setup. And getting chat template errors
|
|
2
|
288
|
May 30, 2025
|
|
How to get torch-npu >= 2.5.1.dev20250308
|
|
3
|
449
|
May 28, 2025
|
|
Question about vllm-ascend performance on server with 8*910B3
|
|
5
|
685
|
May 28, 2025
|
|
Why is this not working? I corrected it but still
|
|
1
|
919
|
May 8, 2025
|
|
Can anyone help me? Why is this not working? It used 😭
|
|
1
|
1222
|
May 8, 2025
|
|
Docker explosion this morning after it worked fine for a long while
|
|
6
|
534
|
May 6, 2025
|
|
32GB vs 48GB vRam
|
|
1
|
1242
|
May 3, 2025
|
|
Run on B200/5090 without building from source?
|
|
1
|
284
|
May 1, 2025
|
|
Running Gemma 3 on multi-chip TPU failure
|
|
5
|
571
|
May 1, 2025
|
|
How to setup amd gpu as default in dual stack gpu?
|
|
10
|
769
|
April 21, 2025
|
|
Is there any working Colab notebook using vLLM with TPU v5e?
|
|
2
|
425
|
April 16, 2025
|
|
Making best use of varying GPU generations
|
|
2
|
929
|
April 11, 2025
|
|
Improving computing power at home for n00bs
|
|
7
|
195
|
April 2, 2025
|
|
Question about vLLM and vLLM Ascend verisoning policy
|
|
4
|
347
|
April 1, 2025
|
|
Jetson orin, CUDA error: no kernel image is available for execution on the device
|
|
0
|
504
|
March 29, 2025
|
|
Can Ascend officially draft a documentation on the vLLM-Ascend adaptation for graph mode?
|
|
6
|
397
|
March 26, 2025
|