Running Qwen/Qwen3-Coder-30B-A3B-Instruct on an Nvidia GH200, I get the following warning :
WARNING 10-10 07:30:42 [fused_moe.py:798] Using default MoE config. Performance might be sub-optimal! Config file not found at [‘/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=128,N=768,device_name=NVIDIA_GH200_144G_HBM3e.json’]
It seems likely that the only difference between the GH200 and H200 for vLLM’s MoE config is the device name string, not the underlying hardware performance or kernel requirements. According to a recent vLLM pull request, the H200 and H200 NVL (which are also very similar to GH200) can share the same MoE config, and the config file is keyed by device name for lookup convenience, not because of a technical need for different tuning parameters. The warning you see is due to the config loader searching for a file with the exact device name (NVIDIA_GH200_144G_HBM3e), not finding it, and not falling back to the H200 config—even though the settings are likely compatible. There is no evidence in the data sources that a different config is required for GH200 vs H200; the duplication is mainly for file lookup, not for performance reasons.
If you want to avoid the warning and get optimal performance, you could copy or symlink the H200 config file to match the GH200 device name, or adjust the config loading logic to fall back to H200 if GH200 is not found, as suggested in the vLLM repo discussions. Would you like more detail on how to do this or on the config file structure itself?