How to preserve custom ops when enabling cuda graph?

BoyaoLyu · November 11, 2025, 8:28am

I have some custom Triton ops, which perform well in force-eager mode. However, when I use cuda graph mode, those ops are negatively optimized into several pieces, leading to a worse performance. How can I keep those Triton ops remain while also enabling graph optimization for other ops?

RunLLM · November 11, 2025, 8:28am

To prevent your custom Triton ops from being fused or replaced during CUDA Graph (cudagraph) mode, you can use the custom_ops and splitting_ops options in vLLM’s CompilationConfig. Set custom_ops to enable your specific ops (e.g., custom_ops=["none", "+my_triton_op"]), and add your custom op names to splitting_ops so that vLLM will split the graph at those ops, keeping them outside of cudagraphs and thus preserving their original execution. This allows cudagraph optimization for other parts of the model while your custom ops remain in their original form. See the documentation for details on these fields and examples of usage in the config API (docs: custom_ops, splitting_ops).

Would you like a step-by-step example or code snippet for this configuration?

Sources:

docs: custom_ops, splitting_ops

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
How can I setup custom triton ops in vllm? General	4	190	November 19, 2025
Cudagraph in V1 V1 Feedback	3	1160	July 21, 2025
Questions on piecewise torch compile design torch.compile integration	50	1438	November 5, 2025
Question about full cudagraph of FlashAttention-v2 General	13	123	January 5, 2026
Data Parallel Attention Cuda Graph General	1	127	May 26, 2025

How to preserve custom ops when enabling cuda graph?

Related topics