How to setup amd gpu as default in dual stack gpu?
|
|
10
|
82
|
April 21, 2025
|
Promblem about the PagedAttention.split_kv_cache implimentation
|
|
1
|
32
|
April 21, 2025
|
[Bug]: Wrong lora mapping during prompt logprobs computing
|
|
1
|
18
|
April 21, 2025
|
[Spec Decode] Why does the acceptance rate look close to the paper, but the throughput is still not high?
|
|
1
|
49
|
April 21, 2025
|
Why V1 not support Mamba Models?
|
|
1
|
29
|
April 21, 2025
|
Nvidia T4 --cpu-offload-gb error
|
|
5
|
58
|
April 19, 2025
|
Goodput Guided Speculative Decoding
|
|
2
|
74
|
April 19, 2025
|
Model.embed return "ValueError: too many values to unpack (expected 1)"
|
|
3
|
17
|
April 18, 2025
|
KeyError: 'model.layers.6.mlp.experts.w2_weight_scale'
|
|
0
|
27
|
April 18, 2025
|
Is there any working Colab notebook using vLLM with TPU v5e?
|
|
2
|
52
|
April 16, 2025
|
AMD GPU Support (W7900)
|
|
1
|
21
|
April 16, 2025
|
Add Support for GLM-4 series model
|
|
1
|
37
|
April 16, 2025
|
Special character in torch.profile results
|
|
0
|
14
|
April 15, 2025
|
Why does phi3 implementation in vLLM inherit from llama?
|
|
1
|
16
|
April 14, 2025
|
Is structured output compatible with automatic prefix caching?
|
|
1
|
33
|
April 14, 2025
|
Tool calling using Offline Inference?
|
|
1
|
20
|
April 14, 2025
|
How to pass custom parameter between `Qwen2ForCausalLM` forward calls
|
|
0
|
30
|
April 13, 2025
|
How to crop kv_caches?
|
|
0
|
23
|
April 13, 2025
|
Can't start whisper in k8s
|
|
1
|
32
|
April 14, 2025
|
Comparsion with omniserve(Lserve, Qserve)
|
|
0
|
24
|
April 14, 2025
|
Dose vllm V1 support asynchronous scheduling?
|
|
1
|
60
|
April 14, 2025
|
The new V1 way to ~--cpu-offload-gb
|
|
5
|
132
|
April 13, 2025
|
What happens to stdout/stderr of worker processes in TP runs?
|
|
1
|
21
|
April 12, 2025
|
What is the purpose of multi process
|
|
2
|
33
|
April 12, 2025
|
Is is possible to initialize an AsyncLLMEngine inside the LLM object?
|
|
4
|
77
|
April 12, 2025
|
vLLM V1 - Default max CUDA graph size
|
|
1
|
133
|
April 12, 2025
|
Does the latest version support deepseek-v3 tool call
|
|
0
|
29
|
April 12, 2025
|
Making best use of varying GPU generations
|
|
2
|
55
|
April 11, 2025
|
Irrelevant Responses with Unsloth Fine-tuned Llama 3.1 8B using vLLM
|
|
3
|
39
|
April 10, 2025
|
Conda and setup.py conflicting advice
|
|
6
|
39
|
April 10, 2025
|