Asking 6-bit Quantization

feyzollahi · November 11, 2025, 1:48pm

Hello every body in vLLM

I ask you to implement possiblity of quantization in 6 bit.

sometimes our GPU cannot handle even FP8 or AWQ-8bit of a model, and we don’t like to use AWQ-4bit since it will reduce accuracy, so we suggest you that making possible to use 6bit parameters models to have better accuracy.

RunLLM · November 11, 2025, 1:48pm

It could be possible that vLLM does not currently support 6-bit quantization, as the official documentation and issues only mention support for INT4 (4-bit), INT8 (8-bit), FP8, and related quantization methods like AWQ, GPTQ, and SmoothQuant, but there is no explicit mention of 6-bit quantization support or plans for it in the provided sources. The quantization options and hardware compatibility are detailed in the docs, but 6-bit is not listed among them, and user requests for other bit-widths (like 4 or 8) are more common and supported via specific toolchains and formats such as GPTQ, AWQ, and FP8/INT8 collections from Hugging Face or via the llm-compressor library. See Quantization - vLLM and Supported Hardware.

Though I’m not entirely certain, it might be that implementing 6-bit quantization would require new CUDA kernels and changes to the quantization toolchain, as current methods and libraries (e.g., GPTQ, AWQ, SmoothQuant, FP8, INT8) do not provide 6-bit support out of the box. If you need better accuracy than 4-bit but less memory than 8-bit, you may need to request this feature from the vLLM team or use a custom quantization pipeline outside of vLLM for now.

Would you like more detail on how to request this feature or on the current quantization workflow in vLLM?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
A bit of frustration with Quantization Quantization	5	746	October 14, 2025
GLM 5 / Kimi k2.5 on 4 x RTX 6000 Pro Quantization	1	252	March 22, 2026
Performance Degradation and Compatibility Issues with AWQ Quantization in vLLM (Qwen2.5-VL-32B) Features	1	619	July 23, 2025
Gemma 3 Quantization General	5	1551	June 21, 2025
RuntimeError: Int8 not supported on SM120. Use FP8 quantization instead, or run on older arch (SM < 100) NVIDIA GPU Support	1	225	November 19, 2025

Asking 6-bit Quantization

Related topics