vLLM Forums

Topic	Replies	Views	Activity
Does the latest version support deepseek-v3 tool call Model Support	0	31	April 12, 2025
Making best use of varying GPU generations NVIDIA GPU Support	2	57	April 11, 2025
Irrelevant Responses with Unsloth Fine-tuned Llama 3.1 8B using vLLM General	3	40	April 10, 2025
Conda and setup.py conflicting advice General	6	41	April 10, 2025
vLLM output vs Ollama General	8	141	April 10, 2025
Minimum requirements for Disaggregated Prefilling? Disaggregated Prefilling	0	46	April 9, 2025
How to get the dev version vllm docker image? General	2	33	April 8, 2025
Text generation doesn't stop General	1	61	April 8, 2025
Is there any roadmap to support prefix caching on dram and disk? Disaggregated Prefilling	0	25	April 8, 2025
Performance Issue While Requests Queuing General	3	17	April 8, 2025
What should be /dev/shm size for larger models General	0	22	April 7, 2025
Can Lora adapters be loaded on different GPUs LoRA	1	25	April 7, 2025
Does VLLM support BERT model General	2	52	April 7, 2025
Engine args ~deep-dive? General	1	41	April 7, 2025
Using openai compatible with `beta.chat.completions.parse` can't do tool call and structured output together General	0	20	April 6, 2025
I would like test KVCache write on SSD storage or shared KVCache storage and benchmark this General	5	59	April 5, 2025
No HIP GPUs are available for VeRL verl	4	60	April 4, 2025
Numerical Difference between vLLM logprobs and huggingface logprobs RL Integration	7	234	April 4, 2025
How to load the model successfully through multi-card in vllm? General	5	92	April 3, 2025
How to use vllm server in intranet General	5	59	April 2, 2025
Improving computing power at home for n00bs Hardware Support	7	72	April 2, 2025
Any plan to support an optimization about computation and communication overlapping General	2	17	April 2, 2025
Question about vLLM and vLLM Ascend verisoning policy Ascend Support	4	146	April 1, 2025
RL Training with vLLM Rollout: How to Mitigate Load Imbalance from Variable Response Lengths RL Integration	4	177	April 1, 2025
Do we have regression tests for structured output? Especially speed regression? Structured Outputs	0	17	March 31, 2025
Why remove bonus token of requset in draft model? Speculative Decoding	0	24	March 30, 2025
Why vllm cannot fully use GPU in batch processing General	12	119	March 29, 2025
Jetson orin, CUDA error: no kernel image is available for execution on the device NVIDIA GPU Support	0	102	March 29, 2025
How to debug chat API General	0	28	March 28, 2025
Why zero_point is set False in gptq_marlin? General	0	14	March 28, 2025