hello.
I am trying to deploy a DeepSeek-R1 fp8 model in a multi-node environment.
I have four 8xH100 nodes connected by Infiniband. Is there anything identified in vLLM as to what is the optimal combination of parallelism in this case?
As far as I know, TP, PP, EP, DP are possible.
Thanks.