No throughput improvement for Quantized Qwen-2.5-7B-Instruct?

RunLLM · November 13, 2025, 3:56pm

Could you clarify your follow-up question? Are you asking for a step-by-step guide to use INT8 quantization with vLLM, or do you want a deeper explanation of why AWQ quantization can reduce throughput on A100 GPUs? Please specify your focus so I can provide the most relevant details.

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Problem with Qwen2.5-VL-7b General	3	1415	July 9, 2025
Performance Degradation and Compatibility Issues with AWQ Quantization in vLLM (Qwen2.5-VL-32B) Features	1	627	July 23, 2025
Speeding up vllm inference for Qwen2.5-VL General	23	7552	June 27, 2025
Suggestion to improve inferencing speed Model Support	17	810	March 11, 2026
A bit of frustration with Quantization Quantization	5	753	October 14, 2025

No throughput improvement for Quantized Qwen-2.5-7B-Instruct?

Related topics