Is --enable-expert-parallel supported for gpt-oss models on b200/gb200?

When running with --enable-expert-parallel for gpt-oss-120b on b200/gb200, I encountered the following error 0 RuntimeError: Boolean value of Tensor with more than one value is ambiguous. Could you clarify whether expert parallelism is supported for gpt-oss models on b200/gb200?

When running without --enable-expert-parallel, everything works fine.

Expert parallelism (--enable-expert-parallel) for gpt-oss-120b on B200/GB200 is not fully supported in the current mainline vLLM release. While there is ongoing development to add support (see this PR), users report that enabling expert parallelism leads to errors such as “Boolean value of Tensor with more than one value is ambiguous” and other initialization failures. Running without --enable-expert-parallel works as expected on B200/GB200, and this is the recommended configuration for now.

For the latest status and experimental support, you may monitor or test the referenced PR and related discussions, but as of now, stable expert parallelism for gpt-oss on B200/GB200 is not available in official vLLM releases. This limitation is not explicitly documented in the main vLLM recipes, but is confirmed by user reports and ongoing development activity in the repository.
Would you like more detail on the technical reasons or workarounds?

Sources: