I ask you to implement possiblity of quantization in 6 bit.
sometimes our GPU cannot handle even FP8 or AWQ-8bit of a model, and we don’t like to use AWQ-4bit since it will reduce accuracy, so we suggest you that making possible to use 6bit parameters models to have better accuracy.
It could be possible that vLLM does not currently support 6-bit quantization, as the official documentation and issues only mention support for INT4 (4-bit), INT8 (8-bit), FP8, and related quantization methods like AWQ, GPTQ, and SmoothQuant, but there is no explicit mention of 6-bit quantization support or plans for it in the provided sources. The quantization options and hardware compatibility are detailed in the docs, but 6-bit is not listed among them, and user requests for other bit-widths (like 4 or 8) are more common and supported via specific toolchains and formats such as GPTQ, AWQ, and FP8/INT8 collections from Hugging Face or via the llm-compressor library. See Quantization - vLLM and Supported Hardware.
Though I’m not entirely certain, it might be that implementing 6-bit quantization would require new CUDA kernels and changes to the quantization toolchain, as current methods and libraries (e.g., GPTQ, AWQ, SmoothQuant, FP8, INT8) do not provide 6-bit support out of the box. If you need better accuracy than 4-bit but less memory than 8-bit, you may need to request this feature from the vLLM team or use a custom quantization pipeline outside of vLLM for now.
Would you like more detail on how to request this feature or on the current quantization workflow in vLLM?