Thank you for your detailed exploration — this is an incredibly helpful report.
I think we can evaluate to make sure there is not a performance footgun of F.torch_log_softmax
and make this switch if you find it more stable. I will dig in a bit this afternoon