Does the latest version support deepseek-v3 tool call
|
|
0
|
31
|
April 12, 2025
|
Making best use of varying GPU generations
|
|
2
|
57
|
April 11, 2025
|
Irrelevant Responses with Unsloth Fine-tuned Llama 3.1 8B using vLLM
|
|
3
|
40
|
April 10, 2025
|
Conda and setup.py conflicting advice
|
|
6
|
41
|
April 10, 2025
|
vLLM output vs Ollama
|
|
8
|
141
|
April 10, 2025
|
Minimum requirements for Disaggregated Prefilling?
|
|
0
|
46
|
April 9, 2025
|
How to get the dev version vllm docker image?
|
|
2
|
33
|
April 8, 2025
|
Text generation doesn't stop
|
|
1
|
61
|
April 8, 2025
|
Is there any roadmap to support prefix caching on dram and disk?
|
|
0
|
25
|
April 8, 2025
|
Performance Issue While Requests Queuing
|
|
3
|
17
|
April 8, 2025
|
What should be /dev/shm size for larger models
|
|
0
|
22
|
April 7, 2025
|
Can Lora adapters be loaded on different GPUs
|
|
1
|
25
|
April 7, 2025
|
Does VLLM support BERT model
|
|
2
|
52
|
April 7, 2025
|
Engine args ~deep-dive?
|
|
1
|
41
|
April 7, 2025
|
Using openai compatible with `beta.chat.completions.parse` can't do tool call and structured output together
|
|
0
|
20
|
April 6, 2025
|
I would like test KVCache write on SSD storage or shared KVCache storage and benchmark this
|
|
5
|
59
|
April 5, 2025
|
No HIP GPUs are available for VeRL
|
|
4
|
60
|
April 4, 2025
|
Numerical Difference between vLLM logprobs and huggingface logprobs
|
|
7
|
234
|
April 4, 2025
|
How to load the model successfully through multi-card in vllm?
|
|
5
|
92
|
April 3, 2025
|
How to use vllm server in intranet
|
|
5
|
59
|
April 2, 2025
|
Improving computing power at home for n00bs
|
|
7
|
72
|
April 2, 2025
|
Any plan to support an optimization about computation and communication overlapping
|
|
2
|
17
|
April 2, 2025
|
Question about vLLM and vLLM Ascend verisoning policy
|
|
4
|
146
|
April 1, 2025
|
RL Training with vLLM Rollout: How to Mitigate Load Imbalance from Variable Response Lengths
|
|
4
|
177
|
April 1, 2025
|
Do we have regression tests for structured output? Especially speed regression?
|
|
0
|
17
|
March 31, 2025
|
Why remove bonus token of requset in draft model?
|
|
0
|
24
|
March 30, 2025
|
Why vllm cannot fully use GPU in batch processing
|
|
12
|
119
|
March 29, 2025
|
Jetson orin, CUDA error: no kernel image is available for execution on the device
|
|
0
|
102
|
March 29, 2025
|
How to debug chat API
|
|
0
|
28
|
March 28, 2025
|
Why zero_point is set False in gptq_marlin?
|
|
0
|
14
|
March 28, 2025
|