What is the purpose of multi process
|
|
2
|
51
|
April 12, 2025
|
Is is possible to initialize an AsyncLLMEngine inside the LLM object?
|
|
4
|
221
|
April 12, 2025
|
Does the latest version support deepseek-v3 tool call
|
|
0
|
79
|
April 12, 2025
|
Making best use of varying GPU generations
|
|
2
|
166
|
April 11, 2025
|
Irrelevant Responses with Unsloth Fine-tuned Llama 3.1 8B using vLLM
|
|
3
|
166
|
April 10, 2025
|
Conda and setup.py conflicting advice
|
|
6
|
128
|
April 10, 2025
|
vLLM output vs Ollama
|
|
8
|
375
|
April 10, 2025
|
Minimum requirements for Disaggregated Prefilling?
|
|
0
|
60
|
April 9, 2025
|
How to get the dev version vllm docker image?
|
|
2
|
44
|
April 8, 2025
|
Text generation doesn't stop
|
|
1
|
184
|
April 8, 2025
|
Performance Issue While Requests Queuing
|
|
3
|
170
|
April 8, 2025
|
Can Lora adapters be loaded on different GPUs
|
|
1
|
50
|
April 7, 2025
|
Does VLLM support BERT model
|
|
2
|
79
|
April 7, 2025
|
Engine args ~deep-dive?
|
|
1
|
56
|
April 7, 2025
|
Using openai compatible with `beta.chat.completions.parse` can't do tool call and structured output together
|
|
0
|
48
|
April 6, 2025
|
I would like test KVCache write on SSD storage or shared KVCache storage and benchmark this
|
|
5
|
131
|
April 5, 2025
|
No HIP GPUs are available for VeRL
|
|
4
|
277
|
April 4, 2025
|
Numerical Difference between vLLM logprobs and huggingface logprobs
|
|
7
|
3634
|
April 4, 2025
|
How to load the model successfully through multi-card in vllm?
|
|
5
|
138
|
April 3, 2025
|
How to use vllm server in intranet
|
|
5
|
207
|
April 2, 2025
|
Improving computing power at home for n00bs
|
|
7
|
87
|
April 2, 2025
|
Any plan to support an optimization about computation and communication overlapping
|
|
2
|
33
|
April 2, 2025
|
Question about vLLM and vLLM Ascend verisoning policy
|
|
4
|
216
|
April 1, 2025
|
RL Training with vLLM Rollout: How to Mitigate Load Imbalance from Variable Response Lengths
|
|
4
|
422
|
April 1, 2025
|
Do we have regression tests for structured output? Especially speed regression?
|
|
0
|
23
|
March 31, 2025
|
Why remove bonus token of requset in draft model?
|
|
0
|
34
|
March 30, 2025
|
Why vllm cannot fully use GPU in batch processing
|
|
12
|
474
|
March 29, 2025
|
Jetson orin, CUDA error: no kernel image is available for execution on the device
|
|
0
|
351
|
March 29, 2025
|
How to debug chat API
|
|
0
|
123
|
March 28, 2025
|
Why zero_point is set False in gptq_marlin?
|
|
0
|
18
|
March 28, 2025
|