True eager backend

fataswellassad · February 24, 2026, 9:29am

On vllm v0.11.1 I ran sth like
VLLM_DISABLE_COMPILE_CACHE=1 VLLM_LOG_LEVEL=debug VLLM_TORCH_PROFILER_DIR=./prof/0111-eager-eager CUDA_VISIBLE_DEVICES=1 vllm serve /sse_ard/dgh/models/Qwen3-4B --served-model-name Qwen3-4B --host 0.0.0.0 --port 1110 -O.backend=eager --enforce-eager

And the profiler summary looks like

Am I right in assuming that these “execute_new_X_cached_Y” are products of torch.compile? There’s no way to turn it off, even with the insane cmd that I supply above.

So my questions are:

what are these “execute_new_X_cached_Y”
how do I truly turn them off? I never encounter these in v0.11.0

RunLLM · February 24, 2026, 9:30am

The “execute_new_X_cached_Y” entries in your profiler summary are generated by TorchInductor, which is part of torch.compile. These represent compiled graph executions, and their presence indicates that torch.compile is still active. According to vLLM documentation, to fully disable torch.compile and CUDAGraphs, you must use both --enforce-eager and set the compilation config mode to NONE (e.g., -cc.mode=0). The --backend=eager flag alone does not guarantee torch.compile is off.

If you still see these entries after using --enforce-eager and -cc.mode=0, it may be a bug or a change in v0.11.1, as previous versions (like v0.11.0) did not show this behavior. For complete disabling, ensure both flags are set as described above. Would you like a step-by-step breakdown of the correct command and config to ensure torch.compile is fully disabled?

Sources:

Hint: Mention @RunLLM in the post for followups.

fataswellassad · February 24, 2026, 9:39am

VLLM_DISABLE_COMPILE_CACHE=1 VLLM_LOG_LEVEL=debug VLLM_TORCH_PROFILER_DIR=./prof/0111-eager-eager CUDA_VISIBLE_DEVICES=1 vllm serve /sse_ard/dgh/models/Qwen3-4B --served-model-name Qwen3-4B --host 0.0.0.0 --port 1110 -O.backend=eager -O.mode=0 --enforce-eager

So I did sth like this. No luck.
Actually you don’t even see the log here (what adapter is used etc)

@RunLLM

fataswellassad · February 24, 2026, 9:47am

VLLM_DISABLE_COMPILE_CACHE=1 VLLM_LOG_LEVEL=debug VLLM_TORCH_PROFILER_DIR=./prof/0111-eager-eager CUDA_VISIBLE_DEVICES=1 vllm serve /sse_ard/dgh/models/Qwen3-4B --served-model-name Qwen3-4B --host 0.0.0.0 --port 1110 -O.backend=eager -O.mode=0 --enforce-eager

So I did sth like this. No luck.
Actually you don’t even see the log here (what adapter is used etc)

@RunLLM

RunLLM · February 24, 2026, 9:48am

Even with -O.backend=eager -O.mode=0 --enforce-eager, if you still see “execute_new_X_cached_Y” in your profiler, torch.compile (and thus TorchInductor) is not fully disabled. According to vLLM debug docs, both --enforce-eager and -cc.mode=0 (or -O.mode=0) should turn off torch.compile and CUDAGraphs, and the backend should be set to “eager” for full eager execution. The code in your image confirms that if backend="eager", EagerAdaptor should be used, not Inductor.

If you still see Inductor artifacts, this may be a bug in v0.11.1, as this behavior is not expected per the official documentation and code. This was not present in v0.11.0, so it could be a regression. Would you like a step-by-step troubleshooting checklist to further isolate the issue?

Sources:

Hint: Mention @RunLLM in the post for followups.

fataswellassad · February 24, 2026, 10:28am

Is there an env var I can set for torch to disable it?
@RunLLM

RunLLM · February 24, 2026, 10:29am

There is no official environment variable in vLLM or PyTorch that fully disables torch.compile; disabling must be done via vLLM’s CLI flags/configs (e.g., --enforce-eager and -cc.mode=0). Setting VLLM_DISABLE_COMPILE_CACHE=1 only disables the compile cache, not torch.compile itself, and there is no documented env var like TORCH_COMPILE_DISABLE or similar for this purpose in the provided sources.

If torch.compile is still active despite these flags, it is likely a bug or regression in vLLM, not a missing environment variable. Would you like more detail on how vLLM parses these flags internally?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
How vllm v1 enable compile as default feature torch.compile integration	5	287	July 2, 2025
Some questions about torch.compile torch.compile integration	1	168	August 17, 2025
Questions on piecewise torch compile design torch.compile integration	50	1575	November 5, 2025
Running INT4 models on CPU General	1	196	August 8, 2025
Build VLLM without installing gcc? General	1	823	June 20, 2025

True eager backend

Related topics