Features Quantization
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
|
About the Quantization category
|
|
0 | 32 | March 20, 2025 |
|
Is there a hook/flag to capture activation statistics during inference for use with llm-compressor AWQ?
|
|
3 | 45 | June 4, 2026 |
|
GPTQModel 能量化 GLM-5 FP16 到 INT8 吗
|
|
9 | 136 | April 24, 2026 |
|
GLM 5 / Kimi k2.5 on 4 x RTX 6000 Pro
|
|
1 | 263 | March 22, 2026 |
|
Why I feel cuda-kernel marlin run not fast?
|
|
5 | 222 | January 9, 2026 |
|
Asking 6-bit Quantization
|
|
1 | 212 | November 11, 2025 |
|
A bit of frustration with Quantization
|
|
5 | 759 | October 14, 2025 |
|
Support for Deploying 4-bit Fine-Tuned Model with LoRA on vLLM
|
|
13 | 1048 | July 30, 2025 |
|
MoE quantization
|
|
9 | 1352 | July 2, 2025 |
|
W8a8两种量化方式有详细介绍吗
|
|
1 | 225 | June 15, 2025 |
|
GGUF quantized models Inference support
|
|
0 | 310 | March 25, 2025 |