What means “using a slow image processor?

What means “using a slow image processor?
Model is Gemma3 12b it. card is 5090.
Is there a fast option and will it make vision faster?

Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You’ll still be able to use a slow processor with use_fast=False.

“Using a slow image processor” means the model is using the original (Python) image processor from HuggingFace Transformers, which is slower than the “fast” (Rust-backed) version. For Gemma3, the warning indicates that use_fast=True will soon be the default, and the fast processor will be used automatically if available, making vision preprocessing faster with only minor output differences. You can explicitly enable the fast processor by setting use_fast=True when loading the processor, if your Transformers version supports it and the model has a fast processor implemented.

For Qwen2-VL, switching to the fast processor can significantly speed up image preprocessing, but for Gemma3, the documentation and code do not explicitly confirm the presence of a fast processor—so the effect depends on HuggingFace’s implementation for Gemma3. The warning is generic and may appear even if only the slow processor is available for your model. For more details, see the discussion in this PR.

Would you like a step-by-step guide on how to check or enable the fast processor for your setup?

Sources:

How exactly change this to a docker vllm container:

use_fast=True when loading the processor, if your Transformers version supports it and the model has a fast processor implemented.

It seems likely that, in a vLLM Docker container, you cannot directly set use_fast=True for the image processor via a vLLM CLI argument or environment variable. The use_fast parameter is set when loading the processor in Python code, but vLLM’s Docker/OpenAI API workflow does not expose this option directly. The warning you see is inherited from HuggingFace Transformers, and unless vLLM adds a CLI flag or config for this, you would need to modify the model or processor files, or build a custom Docker image with patched code. This limitation is not explicitly documented for Gemma3, but is discussed for Qwen2-VL in this PR.

Would you like more detail on how to patch the processor or build a custom Docker image?

Sources:

The bot is wrong, the fast processors are torch based, not rust based.

You might be able to set it using --hf-overrides.@RunLLM what do you think?

You are correct: for vision models like Qwen2-VL and Gemma3, the “fast” image processors are PyTorch-based, not Rust-based (unlike fast tokenizers) (see PR discussion). You may be able to set use_fast=True for the processor by using the --hf-overrides CLI argument in vLLM, which allows you to override HuggingFace config fields at runtime, though this is typically for model config fields and not always guaranteed to affect processor instantiation.

Would you like step-by-step instructions for using --hf-overrides with vLLM Docker?

Sources: