Could you clarify your follow-up question? Are you asking for a step-by-step guide to use INT8 quantization with vLLM, or do you want a deeper explanation of why AWQ quantization can reduce throughput on A100 GPUs? Please specify your focus so I can provide the most relevant details.
Sources:
Hint: Mention @RunLLM in the post for followups.