Basically: Graph mode provides two core acceleration capabilities, kernel fusion and framework overhead reduce benefits, all capabilities will be provided based on torch.compile.
Regarding reduce-overhead , which is equivalent to cudagraph functionality, named aclgraph will be officially released by Q2 at the latest.
Regarding automatic fusion, multiple teams are attempting different implementation approaches, and once mature, we will introduce and integrate them.
Torchair, as the graph mode bridge between torch and ascend, will provide different user experiences in the future through various config_mode options.
If the automatic fusion capability based on inductor matures in the future, we will also consider directly providing inductor-npu backend.
2 Likes