What means “using a slow image processor?

youlearnit · September 14, 2025, 6:03pm

What means “using a slow image processor?
Model is Gemma3 12b it. card is 5090.
Is there a fast option and will it make vision faster?

Using a slow image processor as use_fast is unset and a slow processor was saved with this model. use_fast=True will be the default behavior in v4.52, even if the model was saved with a slow processor. This will result in minor differences in outputs. You’ll still be able to use a slow processor with use_fast=False.

RunLLM · September 14, 2025, 6:03pm

“Using a slow image processor” means the model is using the original (Python) image processor from HuggingFace Transformers, which is slower than the “fast” (Rust-backed) version. For Gemma3, the warning indicates that use_fast=True will soon be the default, and the fast processor will be used automatically if available, making vision preprocessing faster with only minor output differences. You can explicitly enable the fast processor by setting use_fast=True when loading the processor, if your Transformers version supports it and the model has a fast processor implemented.

For Qwen2-VL, switching to the fast processor can significantly speed up image preprocessing, but for Gemma3, the documentation and code do not explicitly confirm the presence of a fast processor—so the effect depends on HuggingFace’s implementation for Gemma3. The warning is generic and may appear even if only the slow processor is available for your model. For more details, see the discussion in this PR.

Would you like a step-by-step guide on how to check or enable the fast processor for your setup?

Sources:

PR: Use fast processor for Qwen2/2.5-VL

youlearnit · September 14, 2025, 6:04pm

How exactly change this to a docker vllm container:

use_fast=True when loading the processor, if your Transformers version supports it and the model has a fast processor implemented.

RunLLM · September 14, 2025, 6:04pm

It seems likely that, in a vLLM Docker container, you cannot directly set use_fast=True for the image processor via a vLLM CLI argument or environment variable. The use_fast parameter is set when loading the processor in Python code, but vLLM’s Docker/OpenAI API workflow does not expose this option directly. The warning you see is inherited from HuggingFace Transformers, and unless vLLM adds a CLI flag or config for this, you would need to modify the model or processor files, or build a custom Docker image with patched code. This limitation is not explicitly documented for Gemma3, but is discussed for Qwen2-VL in this PR.

Would you like more detail on how to patch the processor or build a custom Docker image?

Sources:

PR: Use fast processor for Qwen2/2.5-VL

hmellor · September 14, 2025, 8:25pm

The bot is wrong, the fast processors are torch based, not rust based.

You might be able to set it using --hf-overrides.@RunLLM what do you think?

RunLLM · September 14, 2025, 8:26pm

You are correct: for vision models like Qwen2-VL and Gemma3, the “fast” image processors are PyTorch-based, not Rust-based (unlike fast tokenizers) (see PR discussion). You may be able to set use_fast=True for the processor by using the --hf-overrides CLI argument in vLLM, which allows you to override HuggingFace config fields at runtime, though this is typically for model config fields and not always guaranteed to affect processor instantiation.

Would you like step-by-step instructions for using --hf-overrides with vLLM Docker?

Sources:

PR: Use fast processor for Qwen2/2.5-VL

Topic		Replies	Views
Speeding up vllm inference for Qwen2.5-VL General	23	7646	June 27, 2025
Issue running gemma-3-27b-it with vLLM version: 0.12.0 General	1	203	December 17, 2025
Why is inference for Qwen 2.5 VL so slow when we send an image? General	5	1765	August 20, 2025
Trying to run gemma-3-27b-it-FP8-dynamic with rocm General	23	1024	July 25, 2025
Gemma 3 Quantization General	5	1569	June 21, 2025

What means “using a slow image processor?

Related topics