vLLM Forums

Topic	Replies	Views	Activity
What is the purpose of multi process General	2	51	April 12, 2025
Is is possible to initialize an AsyncLLMEngine inside the LLM object? verl	4	221	April 12, 2025
Does the latest version support deepseek-v3 tool call Model Support	0	79	April 12, 2025
Making best use of varying GPU generations NVIDIA GPU Support	2	166	April 11, 2025
Irrelevant Responses with Unsloth Fine-tuned Llama 3.1 8B using vLLM General	3	166	April 10, 2025
Conda and setup.py conflicting advice General	6	128	April 10, 2025
vLLM output vs Ollama General	8	375	April 10, 2025
Minimum requirements for Disaggregated Prefilling? Disaggregated Prefilling	0	60	April 9, 2025
How to get the dev version vllm docker image? General	2	44	April 8, 2025
Text generation doesn't stop General	1	184	April 8, 2025
Performance Issue While Requests Queuing General	3	170	April 8, 2025
Can Lora adapters be loaded on different GPUs LoRA	1	50	April 7, 2025
Does VLLM support BERT model General	2	79	April 7, 2025
Engine args ~deep-dive? General	1	56	April 7, 2025
Using openai compatible with `beta.chat.completions.parse` can't do tool call and structured output together General	0	48	April 6, 2025
I would like test KVCache write on SSD storage or shared KVCache storage and benchmark this General	5	131	April 5, 2025
No HIP GPUs are available for VeRL verl	4	277	April 4, 2025
Numerical Difference between vLLM logprobs and huggingface logprobs RL Integration	7	3634	April 4, 2025
How to load the model successfully through multi-card in vllm? General	5	138	April 3, 2025
How to use vllm server in intranet General	5	207	April 2, 2025
Improving computing power at home for n00bs Hardware Support	7	87	April 2, 2025
Any plan to support an optimization about computation and communication overlapping General	2	33	April 2, 2025
Question about vLLM and vLLM Ascend verisoning policy Ascend Support	4	216	April 1, 2025
RL Training with vLLM Rollout: How to Mitigate Load Imbalance from Variable Response Lengths RL Integration	4	422	April 1, 2025
Do we have regression tests for structured output? Especially speed regression? Structured Outputs	0	23	March 31, 2025
Why remove bonus token of requset in draft model? Speculative Decoding	0	34	March 30, 2025
Why vllm cannot fully use GPU in batch processing General	12	474	March 29, 2025
Jetson orin, CUDA error: no kernel image is available for execution on the device NVIDIA GPU Support	0	351	March 29, 2025
How to debug chat API General	0	123	March 28, 2025
Why zero_point is set False in gptq_marlin? General	0	18	March 28, 2025