Am I right in assuming that these “execute_new_X_cached_Y” are products of torch.compile? There’s no way to turn it off, even with the insane cmd that I supply above.
So my questions are:
what are these “execute_new_X_cached_Y”
how do I truly turn them off? I never encounter these in v0.11.0
The “execute_new_X_cached_Y” entries in your profiler summary are generated by TorchInductor, which is part of torch.compile. These represent compiled graph executions, and their presence indicates that torch.compile is still active. According to vLLM documentation, to fully disable torch.compile and CUDAGraphs, you must use both --enforce-eager and set the compilation config mode to NONE (e.g., -cc.mode=0). The --backend=eager flag alone does not guarantee torch.compile is off.
If you still see these entries after using --enforce-eager and -cc.mode=0, it may be a bug or a change in v0.11.1, as previous versions (like v0.11.0) did not show this behavior. For complete disabling, ensure both flags are set as described above. Would you like a step-by-step breakdown of the correct command and config to ensure torch.compile is fully disabled?
Even with -O.backend=eager -O.mode=0 --enforce-eager, if you still see “execute_new_X_cached_Y” in your profiler, torch.compile (and thus TorchInductor) is not fully disabled. According to vLLM debug docs, both --enforce-eager and -cc.mode=0 (or -O.mode=0) should turn off torch.compile and CUDAGraphs, and the backend should be set to “eager” for full eager execution. The code in your image confirms that if backend="eager", EagerAdaptor should be used, not Inductor.
If you still see Inductor artifacts, this may be a bug in v0.11.1, as this behavior is not expected per the official documentation and code. This was not present in v0.11.0, so it could be a regression. Would you like a step-by-step troubleshooting checklist to further isolate the issue?
There is no official environment variable in vLLM or PyTorch that fully disables torch.compile; disabling must be done via vLLM’s CLI flags/configs (e.g., --enforce-eager and -cc.mode=0). Setting VLLM_DISABLE_COMPILE_CACHE=1 only disables the compile cache, not torch.compile itself, and there is no documented env var like TORCH_COMPILE_DISABLE or similar for this purpose in the provided sources.
If torch.compile is still active despite these flags, it is likely a bug or regression in vLLM, not a missing environment variable. Would you like more detail on how vLLM parses these flags internally?