[Bug] Segfault in PythonSymNodeImpl and Deadlock on RTX 5090 (Blackwell) with vLLM 0.11.2

Environment:

  • GPU: NVIDIA GeForce RTX 5090 (Blackwell / sm_120)

  • VRAM: 32 GB

  • Driver Version: 580.95.05

  • CUDA Version: 13.0 (Runtime 12.9)

  • Python Version: 3.12

  • vLLM Version: 0.11.2

  • Model: Qwen3-8B (merged with LoRA) quantized in 8-bit via bitsandbytes.

I am encountering critical stability issues when running vLLM 0.11.2 on the new RTX 5090 architecture. Under load (stress test with 30 concurrent users), the server blokcs unitl i force reboot (Kernel Panic) or deadlocks (GPU-Util drops to 0% while VRAM remains full).

Error 1: Segmentation Fault during initialization
When launching vLLM with default settings (V1 Engine enabled), I get a segfault during the model warmup/CUDA graph capture phase:

!!!!!!! Segfault encountered !!!!!!!
File "<unknown>", line 0, in c10::intrusive_ptr<torch::impl::PythonSymNodeImpl, c10::detail::intrusive_target_default_null_type<torch::impl::PythonSymNodeImpl> >
c10::intrusive_ptr<torch::impl::PythonSymNodeImpl, c10::detail::intrusive_target_default_null_type<torch::impl::PythonSymNodeImpl> >::make<pybind11::object&>(pybind11::object&)
RuntimeError: Engine core initialization failed. Failed core proc(s): {'EngineCore_DP0': -11}

Error 2: SystemError in Shared Memory Broadcast
If the engine manages to start, it deadlocks after processing ~50 requests with the following error:

(EngineCore_DP0 pid=186) ERROR [core.py:844] EngineCore encountered a fatal error.
SystemError: attempting to create PyCFunction with class but no METH_METHOD flag
File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/shm_broadcast.py", line 229, in get_metadata
with self.shared_memory.buf[start:end] as buf:

Is there a known incompatibility between sm_120 and the current PythonSymNodeImpl logic in PyTorch/vLLM?

  • Does bitsandbytes 8-bit quantization require specific kernels for Blackwell that might be missing in the current vLLM build?

  • Why is shm_broadcast.py triggering a SystemError specifically on Python 3.12 with this hardware?

    here is my docker command of vllm (the model merged with lora adapter with quantisation int8)
    sudo docker run -d \
    --name vllm \
    --restart always \
    --network network \
    --runtime nvidia \
    --gpus all \
    --memory=32g \
    --shm-size=16g \
    --ipc=host \
    -e VLLM_USE_V1=0 \
    -e VLLM_WORKER_MULTIPROC_METHOD=spawn \
    -e TOKENIZERS_PARALLELISM=false \
    -v /home/qwen3-8b-v4-int8-final_1:/model \
    -p 8001:8001 \
    vllm/vllm-openai:latest \
    --model /model \
    --host 0.0.0.0 \
    --port 8001 \
    --load-format bitsandbytes \
    --quantization bitsandbytes \
    --dtype bfloat16 \
    --max-model-len 8192 \
    --gpu-memory-utilization 0.80 \
    --max-num-seqs 8 \
    --enforce-eager \
    --disable-custom-all-reduce \
    --distributed-executor-backend mp \
    --max-num-batched-tokens 4096 \
    --disable-frontend-multiprocessing
    and also the docker command (the model merged with lora adapter without quantisation)

  sudo docker run -d \
  --name vllm \
  --restart always \
  --network network \
  --runtime nvidia \
  --gpus all \
  --ipc=host \
  -v /home/qwen3-8b-v4-merged-final_1:/model \
  -p 8001:8001 \
  vllm/vllm-openai:latest \
  --model /model \
  --host 0.0.0.0 \
  --port 8001 \
  --dtype bfloat16 \
  --max-model-len 8192 \
  --gpu-memory-utilization 0.90 \
  --max-num-seqs 8 \
  --enforce-eager

here is a version of last logs (for when using the model merged without quantisation)


Model loading took 15.2683 GiB memory and 3.691282 seconds
!!!!!!! Segfault encountered !!!!!!!
File "<unknown>", line 0, in c10::intrusive_ptr<torch::impl::PythonSymNodeImpl, c10::detail::intrusive_target_default_null_typetorch::impl::PythonSymNodeImpl > c10::intrusive_ptr<torch::impl::PythonSymNodeImpl, c10::detail::intrusive_target_default_null_typetorch::impl::PythonSymNodeImpl >::makepybind11::object&(pybind11::object&)

(APIServer pid=1) Traceback (most recent call last):
(APIServer pid=1) File "/usr/local/bin/vllm", line 10, in <module>
(APIServer pid=1) sys.exit(main())
(APIServer pid=1) ^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=1) args.dispatch_function(args)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/cli/serve.py", line 60, in cmd
(APIServer pid=1) uvloop.run(run_server(args))
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 96, in run
(APIServer pid=1) return __asyncio.run(
(APIServer pid=1) ^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
(APIServer pid=1) return runner.run(main)
(APIServer pid=1) ^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
(APIServer pid=1) return self._loop.run_until_complete(task)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 48, in wrapper
(APIServer pid=1) return await main
(APIServer pid=1) ^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 2024, in run_server
(APIServer pid=1) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 2043, in run_server_worker
(APIServer pid=1) async with build_async_engine_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 195, in build_async_engine_client
(APIServer pid=1) async with build_async_engine_client_from_engine_args(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
(APIServer pid=1) return await anext(self.gen)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 236, in build_async_engine_client_from_engine_args
(APIServer pid=1) async_llm = AsyncLLM.from_vllm_config(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/utils/func_utils.py", line 116, in inner
(APIServer pid=1) return fn(*args, **kwargs)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 203, in from_vllm_config
(APIServer pid=1) return cls(
(APIServer pid=1) ^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 133, in init
(APIServer pid=1) self.engine_core = EngineCoreClient.make_async_mp_client(
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 121, in make_async_mp_client
(APIServer pid=1) return AsyncMPClient(*client_args)
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 808, in init
(APIServer pid=1) super().init(
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 469, in init
(APIServer pid=1) with launch_core_engines(vllm_config, executor_class, log_stats) as (
(APIServer pid=1) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(APIServer pid=1) File "/usr/lib/python3.12/contextlib.py", line 144, in exit
(APIServer pid=1) next(self.gen)
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 907, in launch_core_engines
(APIServer pid=1) wait_for_engine_startup(
(APIServer pid=1) File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/utils.py", line 964, in wait_for_engine_startup
(APIServer pid=1) raise RuntimeError(
(APIServer pid=1) RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
/usr/lib/python3.12/multiprocessing/resource_tracker.py:279: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '

There are known critical compatibility and stability issues when running vLLM (including v0.11.2) on NVIDIA RTX 5090 (Blackwell, sm_120) with CUDA 13.x and Python 3.12. These include segmentation faults during CUDA graph capture/model warmup and shared memory (shm_broadcast) errors, especially under load or with bitsandbytes quantization. The root causes are: (1) vLLM and its dependencies (PyTorch, Triton, bitsandbytes) do not yet officially support Blackwell (sm_120) GPUs, and (2) Python 3.12 introduces additional incompatibilities in multiprocessing and shared memory handling on this hardware. There is no evidence in the data sources that bitsandbytes 8-bit quantization is supported or stable on Blackwell, nor that the PythonSymNodeImpl segfault is resolved for this architecture.

Workarounds are limited. Some users report partial success by building vLLM and all dependencies (including PyTorch and Triton) from source with explicit CUDA arch flags for sm_120, but even then, segmentation faults and deadlocks persist, especially with quantized models or under high concurrency. There is no official fix or recommended configuration for Blackwell GPUs as of vLLM 0.11.2. You may need to wait for upstream support in vLLM, PyTorch, and bitsandbytes for sm_120, or use supported hardware/software combinations for production workloads. For more, see the detailed discussions and error logs in vLLM issue #30493, vLLM issue #23231, and vLLM issue #24383.

Would you like a step-by-step breakdown of the technical reasons and possible (experimental) troubleshooting steps?

Sources:

Hint: Mention @RunLLM in the post for followups.