Hello
I wanted to run some model evaluations on a GGUF quantized llama 3 based model using vLLM to load the model. The current latest documentation suggests that GGUF support in vLLM is highly experimental and under-optimized at the moment. What does this mean? Does this mean the model quality (generation quality) itself could degrade or does it mean the inference optimizations related to latencies and throughput are not yet supported?
Thanks.