Is --enable-expert-parallel supported for gpt-oss models on b200/gb200?

mispa-ms · August 28, 2025, 10:44pm

When running with --enable-expert-parallel for gpt-oss-120b on b200/gb200, I encountered the following error 0 RuntimeError: Boolean value of Tensor with more than one value is ambiguous. Could you clarify whether expert parallelism is supported for gpt-oss models on b200/gb200?

When running without --enable-expert-parallel, everything works fine.

RunLLM · August 28, 2025, 10:44pm

Expert parallelism (--enable-expert-parallel) for gpt-oss-120b on B200/GB200 is not fully supported in the current mainline vLLM release. While there is ongoing development to add support (see this PR), users report that enabling expert parallelism leads to errors such as “Boolean value of Tensor with more than one value is ambiguous” and other initialization failures. Running without --enable-expert-parallel works as expected on B200/GB200, and this is the recommended configuration for now.

For the latest status and experimental support, you may monitor or test the referenced PR and related discussions, but as of now, stable expert parallelism for gpt-oss on B200/GB200 is not available in official vLLM releases. This limitation is not explicitly documented in the main vLLM recipes, but is confirmed by user reports and ongoing development activity in the repository.
Would you like more detail on the technical reasons or workarounds?

Sources:

DP/EP Support for gpt-oss with deepep-ht comm kernel on SM100

Topic		Replies	Views
Expert Parallelism All-to-All Communication without NVLink and DeepEP General	3	338	March 3, 2026
为什么ep的 size=tp*dp General	8	167	November 25, 2025
GPT OSS vllm server General	2	552	November 18, 2025
Status of gpt-oss General	3	438	February 3, 2026
[Question] What if TP size < GPU Rank size? General	12	200	November 25, 2025

Is --enable-expert-parallel supported for gpt-oss models on b200/gb200?

Related topics