Topic	Replies	Views	Activity
Why zero_point is set False in gptq_marlin? General	0	14	March 28, 2025
Using nsys profile profile vLLM General	0	22	March 27, 2025
Why is the prefix cache hit rate constantly increasing KV-Cache	3	77	March 27, 2025
Async version of LLM.chat()? General	0	36	March 26, 2025
Multiple tools with Mistral Large 2411 Tool Calling	4	66	March 26, 2025
Difference in Log Probabilities Between vLLM and HF Model in Identical Environment General	6	133	March 26, 2025
Can Ascend officially draft a documentation on the vLLM-Ascend adaptation for graph mode? Ascend Support	6	158	March 26, 2025
Pipeline Parallelism Support - Source Code Location Features	1	53	March 25, 2025
GGUF quantized models Inference support Quantization	0	85	March 25, 2025
Does the vLLM v1 support Speculative Decoding now? V1 Feedback	4	134	March 25, 2025
V1 Chunked Prefill Scheduling Policy: how prefill would be scheduled? Scheduling	8	143	March 25, 2025
Prefix Cache control KV-Cache	1	80	March 24, 2025
vLLM Load balancing General	1	67	March 24, 2025
Will CPU Offload be supported in V1? General	3	290	March 24, 2025
Avoiding hash collisions in prefix cache KV-Cache	7	93	March 24, 2025
Omitting the vision stack for gemma3? General	1	173	March 23, 2025
[Developer Question] How are the position indices updated? General	4	20	March 22, 2025
How to get torch-npu >= 2.5.1.dev20250308 Ascend Support	1	112	March 21, 2025
How to build a VLLM python wheel can be used by other GPU types? General	2	62	March 21, 2025
About clean_up in the example of vllm-ascend official website Ascend Support	2	133	March 20, 2025
How to start embedding models General	7	216	March 20, 2025