Persistent segfaults/SIGSEGV

We have been seeing inconsistent SIGSEGV with both custom model impl + Qwen3 on vllm-0.10.0rc2. Was wondering if anyone has figured out the root cause or if this is addressed in later versions?

There are many related github issues, none of which seem to be resolved:

Trace
(RewardLoopWorker pid=3555498) WARNING:2026-01-19 22:51:41,021:We did not manage to extract a prediction in the correct format. Gold: ['\\boxed{B}'], Pred: ['']
(RewardLoopWorker pid=3555498) WARNING:2026-01-19 22:51:41,022:We did not manage to extract a prediction in the correct format. Gold: ['\\boxed{B. 8}'], Pred: ['']
(RewardLoopWorker pid=3555498) WARNING:2026-01-19 22:51:41,024:We did not manage to extract a prediction in the correct format. Gold: ['\\boxed{8}'], Pred: ['']
(WorkerDict pid=3546913) *** SIGSEGV received at time=1768863357 on cpu 79 ***
(WorkerDict pid=3546913) PC: @                0x4  (unknown)  (unknown)
(WorkerDict pid=3546913)     @     0x7ffff7cd2520       4000  (unknown)
(WorkerDict pid=3546913)     @     0x7fce4b9d3155        352  (unknown)
(WorkerDict pid=3546913)     @     0x7fce4b9dde19       1344  (unknown)
(WorkerDict pid=3546913)     @     0x7fce4bac1c6c        224  (unknown)
(WorkerDict pid=3546913)     @     0x7fffdb82094b         64  (unknown)
(WorkerDict pid=3546913)     @     0x7fffdb87c01b        224  cudaGraphLaunch
(WorkerDict pid=3546913)     @     0x7fceef0ca81a  (unknown)  at::cuda::CUDAGraph::replay()
(WorkerDict pid=3546913)     @     0x7fbb7a668130  (unknown)  (unknown)
(WorkerDict pid=3546913)     @                0x4  (unknown)  (unknown)
(WorkerDict pid=3546913) [2026-01-19 22:55:57,644 E 3546913 3548010] logging.cc:474: *** SIGSEGV received at time=1768863357 on cpu 79 ***
(WorkerDict pid=3546913) [2026-01-19 22:55:57,651 E 3546913 3548010] logging.cc:474: PC: @                0x4  (unknown)  (unknown)
(WorkerDict pid=3546913) [2026-01-19 22:55:57,655 E 3546913 3548010] logging.cc:474:     @     0x7ffff7cd2520       4000  (unknown)
(WorkerDict pid=3546913) [2026-01-19 22:55:57,655 E 3546913 3548010] logging.cc:474:     @     0x7fce4b9d3155        352  (unknown)
(WorkerDict pid=3546913) [2026-01-19 22:55:57,655 E 3546913 3548010] logging.cc:474:     @     0x7fce4b9dde19       1344  (unknown)
(WorkerDict pid=3546913) [2026-01-19 22:55:57,655 E 3546913 3548010] logging.cc:474:     @     0x7fce4bac1c6c        224  (unknown)
(WorkerDict pid=3546913) [2026-01-19 22:55:57,655 E 3546913 3548010] logging.cc:474:     @     0x7fffdb82094b         64  (unknown)
(WorkerDict pid=3546913) [2026-01-19 22:55:57,655 E 3546913 3548010] logging.cc:474:     @     0x7fffdb87c01b        224  cudaGraphLaunch
(WorkerDict pid=3546913) [2026-01-19 22:55:57,655 E 3546913 3548010] logging.cc:474:     @     0x7fceef0ca81a  (unknown)  at::cuda::CUDAGraph::replay()
(WorkerDict pid=3546913) [2026-01-19 22:55:57,659 E 3546913 3548010] logging.cc:474:     @     0x7fbb7a668130  (unknown)  (unknown)
(WorkerDict pid=3546913) [2026-01-19 22:55:57,666 E 3546913 3548010] logging.cc:474:     @                0x4  (unknown)  (unknown)
(WorkerDict pid=3546913) Fatal Python error: Segmentation fault
(WorkerDict pid=3546913) 
(WorkerDict pid=3546913) Stack (most recent call first):
(WorkerDict pid=3546913)   File "/workspace/verl_trainer/.venv/lib/python3.12/site-packages/torch/cuda/graphs.py", line 117 in replay
(WorkerDict pid=3546913)   File "/workspace/Zvllm/vllm/compilation/cuda_graph.py", line 198 in __call__
(WorkerDict pid=3546913)   File "/workspace/Zvllm/vllm/v1/worker/gpu_model_runner.py", line 2298 in execute_model
(WorkerDict pid=3546913)   File "/workspace/verl_trainer/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120 in decorate_context
(WorkerDict pid=3546913)   File "/workspace/Zvllm/vllm/v1/worker/gpu_worker.py", line 447 in execute_model
(WorkerDict pid=3546913)   File "/workspace/verl_trainer/.venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120 in decorate_context
(WorkerDict pid=3546913)   File "/workspace/Zvllm/vllm/utils/__init__.py", line 3122 in run_method
(WorkerDict pid=3546913)   File "/workspace/Zvllm/vllm/worker/worker_base.py", line 267 in execute_method
(WorkerDict pid=3546913)   File "/tmp/ray/session_2026-01-19_10-36-59_738292_3529560/runtime_resources/working_dir_files/_ray_pkg_385e62c0114024ab/verl/workers/rollout/vllm_rollout/vllm_rollout.py", line 224 in _execute_method
(WorkerDict pid=3546913)   File "/tmp/ray/session_2026-01-19_10-36-59_738292_3529560/runtime_resources/working_dir_files/_ray_pkg_385e62c0114024ab/verl/workers/rollout/vllm_rollout/vllm_rollout.py", line 177 in _loop_forever
(WorkerDict pid=3546913)   File "/workspace/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py", line 1012 in run
(WorkerDict pid=3546913)   File "/workspace/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py", line 1075 in _bootstrap_inner
(WorkerDict pid=3546913)   File "/workspace/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/threading.py", line 1032 in _bootstrap
(WorkerDict pid=3546913) 
(WorkerDict pid=3546913) Extension modules: msgpack._cmsgpack, psutil._psutil_linux, google._upb._message, charset_normalizer.md, requests.packages.charset_normalizer.md, requests.packages.chardet.md, yaml._yaml, uvloop.loop, ray._raylet, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, torch._C, torch._C._dynamo.autograd_compiler, torch._C._dynamo.eval_frame, torch._C._dynamo.guards, torch._C._dynamo.utils, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, regex._regex, markupsafe._speedups, PIL._imaging, scipy._lib._ccallback_c, scipy.linalg._fblas, scipy.linalg._flapack, _cyutility, scipy._cyutility, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_schur_sqrtm, scipy.linalg._matfuncs_expm, scipy.linalg._linalg_pythran, scipy.linalg.cython_blas, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._slsqplib, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy._lib._uarray._uarray, scipy.special._ufuncs_cxx, scipy.special._ellip_harm_2, scipy.special._special_ufuncs, scipy.special._gufuncs, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.linalg._decomp_interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._hausdorff, scipy.spatial._distance_wrap, scipy.spatial.transform._rotation, scipy.spatial.transform._rigid_transform, scipy.optimize._direct, PIL._imagingft, av._core, av.logging, av.bytesource, av.buffer, av.audio.format, av.error, av.dictionary, av.container.pyio, av.option, av.descriptor, av.format, av.utils, av.stream, av.container.streams, av.sidedata.motionvectors, av.sidedata.sidedata, av.opaque, av.packet, av.container.input, av.container.output, av.container.core, av.codec.context, av.video.format, av.video.reformatter, av.plane, av.video.plane, av.video.frame, av.video.stream, av.codec.hwaccel, av.codec.codec, av.frame, av.audio.layout, av.audio.plane, av.audio.frame, av.audio.stream, av.filter.link, av.filter.context, av.filter.graph, av.filter.filter, av.filter.loudnorm, av.audio.resampler, av.audio.codeccontext, av.audio.fifo, av.bitstream, av.video.codeccontext, _cffi_backend, _cbor2, setproctitle._setproctitle, zmq.backend.cython._zmq, multidict._multidict, yarl._quoting_c, propcache._helpers_c, aiohttp._http_writer, aiohttp._http_parser, aiohttp._websocket.mask, aiohttp._websocket.reader_c, frozenlist._frozenlist, pyarrow.lib, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, pyarrow._parquet, pyarrow._fs, pyarrow._azurefs, pyarrow._hdfs, pyarrow._gcsfs, pyarrow._s3fs, xxhash._xxhash, pyarrow._acero, pyarrow._csv, pyarrow._json, pyarrow._substrait, pyarrow._dataset, pyarrow._dataset_orc, pyarrow._parquet_encryption, pyarrow._dataset_parquet_encryption, pyarrow._dataset_parquet, nvtx._lib.lib, nvtx._lib.profiler, sentencepiece._sentencepiece, msgspec._core, pybase64._pybase64, numba.core.typeconv._typeconv, numba._helperlib, numba._dynfunc, numba._dispatcher, numba.core.typing.builtins.itertools, numba.cpython.builtins.math, numba.core.runtime._nrt_python, numba.np.ufunc._internal, numba.experimental.jitclass._box, vllm.cumem_allocator, cuda_utils, __triton_launcher (total: 221)
(raylet) A worker died or was killed while executing a task by an unexpected system error. To troubleshoot the problem, check the logs for the dead worker. Lease ID: 06000000610da2e5ba04608c250235880f429b08ae2b17471bd777123ce43c68 Worker ID: 72612bb7364a1087afc287488e22d902eb4bcaaaadb5630f2296d358 Node ID: 75d0cfca4322a528d3ea520a8419dd8f8b02f7f42571bda6252c0bc9 Worker IP address: 10.28.5.11 Worker port: 10199 Worker PID: 3546913 Worker exit type: SYSTEM_ERROR Worker exit detail: Worker unexpectedly exits with a connection error code 2. End of file. Some common causes include: (1) the process was killed by the OOM killer due to high memory usage, (2) ray stop --force was called, or (3) the worker crashed unexpectedly due to SIGSEGV or another unexpected error.

I’ve seen the same trace with both triton and flashattn attention backends, inconsistent, sometimes 5 hours into an eval.

Inconsistent SIGSEGV (segmentation fault) errors with vLLM, especially in distributed/Ray setups and with models like Qwen3, are a known issue and have been reported multiple times. The root causes are often related to NCCL communication errors, CUDA graph replay, or network/memory misconfiguration, but there is no single universally resolved root cause as of the latest reports. Upgrading vLLM may help, but even recent versions and different attention backends (triton, flashattn, xformers) have shown similar instability in some environments. Workarounds include setting --enforce-eager to disable CUDA graphs (at a performance cost), tuning memory parameters, and ensuring correct NCCL/GLOO network interface settings, but these do not guarantee a fix for all users or models. See the following for details and user experiences: ray cluster Segmentation fault, SIGSEGV received at time=1721904360.

If you want to debug further, try enabling more logging (VLLM_LOGGING_LEVEL=DEBUG, NCCL_DEBUG=TRACE), and check for OOM or network issues. If the error occurs during CUDA graph replay, try --enforce-eager to isolate the problem. For distributed setups, ensure NCCL_SOCKET_IFNAME and GLOO_SOCKET_IFNAME are set to the correct network interface. There is no evidence in the data sources that a specific vLLM release fully resolves these SIGSEGVs for all users and models. Would you like a step-by-step troubleshooting checklist or more detail on a specific workaround?

Sources:

Hint: Mention @RunLLM in the post for followups.