vLLM Forums

Topic	Replies	Views	Activity
Welcome to vLLM Forums! :wave: General	1	386	March 24, 2025
Questions on piecewise torch compile design torch.compile integration	1	1	May 7, 2025
Docker explosion this morning after it worked fine for a long while NVIDIA GPU Support	6	6	May 6, 2025
vLLM cannot connect to existing Ray cluster Ray	14	10	May 6, 2025
Offline multi-node inference General	7	6	May 6, 2025
What is the deal with the runllm bot? General	3	22	May 6, 2025
How can I find top-down architecture of vLLM? General	1	19	May 5, 2025
32GB vs 48GB vRam NVIDIA GPU Support	1	7	May 3, 2025
Support for (sparse) key value caching KV-Cache	16	20	May 3, 2025
What is the optimal combination of parallelism when deploying deepseek-r1 with multiple nodes? General	0	9	May 2, 2025
How to use speculative decoding? Speculative Decoding	3	17	May 1, 2025
Run on B200/5090 without building from source? NVIDIA GPU Support	1	8	May 1, 2025
Running Gemma 3 on multi-chip TPU failure Google TPU Support	5	98	May 1, 2025
Go Deeper / Go Dumber by vLLM AI bot Site Feedback	5	54	May 1, 2025
Flash Attention 3 FP8 Support General	1	20	April 30, 2025
Grammar CPU bound performance Structured Outputs	9	101	April 29, 2025
Setting up VllmConfig for custom gpt2 General	2	18	April 29, 2025
I published a performance test result of vllm vs sglang but can someone help me explain it? General	3	45	April 29, 2025
Which arguments affect GPU memory General	1	16	April 29, 2025
Is Batch Inference for Multimodal Models Truly Batch Inference? General	3	20	April 29, 2025
How to get `http_*` metrics as this doc suggests are available? General	1	16	April 28, 2025
Extract class log probabilities from LLM classifier using async vLLM and AWS LMI containers General	1	8	April 28, 2025
Does vLLM support multiple model_executor? Scheduling	1	8	April 28, 2025
Question about V1 code(v0.8.2) General	3	15	April 28, 2025
Failed to run distribute Inference with vllm 0.8.2 General	6	47	April 27, 2025
Intergate async-llm to openrlfh, when tensor_parallel_size > 1 will cause ray-error OpenRLHF	1	17	April 27, 2025
Spec decode with eagle get very low Draft acceptance rate Speculative Decoding	1	36	April 25, 2025
Cuda failure 'out of memory' General	1	39	April 24, 2025
Why vLLM uses alot of CPU memory General	1	87	April 21, 2025
LoRA Adapter enabling with vLLM is not working LoRA	4	55	April 21, 2025