True eager backend

On vllm v0.11.1 I ran sth like
VLLM_DISABLE_COMPILE_CACHE=1 VLLM_LOG_LEVEL=debug VLLM_TORCH_PROFILER_DIR=./prof/0111-eager-eager CUDA_VISIBLE_DEVICES=1 vllm serve /sse_ard/dgh/models/Qwen3-4B --served-model-name Qwen3-4B --host 0.0.0.0 --port 1110 -O.backend=eager --enforce-eager

And the profiler summary looks like


Am I right in assuming that these “execute_new_X_cached_Y” are products of torch.compile? There’s no way to turn it off, even with the insane cmd that I supply above.

So my questions are:

  • what are these “execute_new_X_cached_Y”
  • how do I truly turn them off? I never encounter these in v0.11.0

The “execute_new_X_cached_Y” entries in your profiler summary are generated by TorchInductor, which is part of torch.compile. These represent compiled graph executions, and their presence indicates that torch.compile is still active. According to vLLM documentation, to fully disable torch.compile and CUDAGraphs, you must use both --enforce-eager and set the compilation config mode to NONE (e.g., -cc.mode=0). The --backend=eager flag alone does not guarantee torch.compile is off.

If you still see these entries after using --enforce-eager and -cc.mode=0, it may be a bug or a change in v0.11.1, as previous versions (like v0.11.0) did not show this behavior. For complete disabling, ensure both flags are set as described above. Would you like a step-by-step breakdown of the correct command and config to ensure torch.compile is fully disabled?

Sources:

Hint: Mention @RunLLM in the post for followups.

VLLM_DISABLE_COMPILE_CACHE=1 VLLM_LOG_LEVEL=debug VLLM_TORCH_PROFILER_DIR=./prof/0111-eager-eager CUDA_VISIBLE_DEVICES=1 vllm serve /sse_ard/dgh/models/Qwen3-4B --served-model-name Qwen3-4B --host 0.0.0.0 --port 1110 -O.backend=eager -O.mode=0 --enforce-eager

So I did sth like this. No luck.
Actually you don’t even see the log here (what adapter is used etc)

@RunLLM

VLLM_DISABLE_COMPILE_CACHE=1 VLLM_LOG_LEVEL=debug VLLM_TORCH_PROFILER_DIR=./prof/0111-eager-eager CUDA_VISIBLE_DEVICES=1 vllm serve /sse_ard/dgh/models/Qwen3-4B --served-model-name Qwen3-4B --host 0.0.0.0 --port 1110 -O.backend=eager -O.mode=0 --enforce-eager

So I did sth like this. No luck.
Actually you don’t even see the log here (what adapter is used etc)


@RunLLM

Even with -O.backend=eager -O.mode=0 --enforce-eager, if you still see “execute_new_X_cached_Y” in your profiler, torch.compile (and thus TorchInductor) is not fully disabled. According to vLLM debug docs, both --enforce-eager and -cc.mode=0 (or -O.mode=0) should turn off torch.compile and CUDAGraphs, and the backend should be set to “eager” for full eager execution. The code in your image confirms that if backend="eager", EagerAdaptor should be used, not Inductor.

If you still see Inductor artifacts, this may be a bug in v0.11.1, as this behavior is not expected per the official documentation and code. This was not present in v0.11.0, so it could be a regression. Would you like a step-by-step troubleshooting checklist to further isolate the issue?

Sources:

Hint: Mention @RunLLM in the post for followups.

Is there an env var I can set for torch to disable it?
@RunLLM

There is no official environment variable in vLLM or PyTorch that fully disables torch.compile; disabling must be done via vLLM’s CLI flags/configs (e.g., --enforce-eager and -cc.mode=0). Setting VLLM_DISABLE_COMPILE_CACHE=1 only disables the compile cache, not torch.compile itself, and there is no documented env var like TORCH_COMPILE_DISABLE or similar for this purpose in the provided sources.

If torch.compile is still active despite these flags, it is likely a bug or regression in vLLM, not a missing environment variable. Would you like more detail on how vLLM parses these flags internally?

Sources:

Hint: Mention @RunLLM in the post for followups.