Your current environment
For the environment info, please refer to [Bug]: failed to run distribute Inference with vllm 0.8.2 · Issue #17160 · vllm-project/vllm · GitHub
I am trying to run the distribute inference in ray with vllm 0.8.2 and failed with the following errors, here are the steps:
- I followed this link to do the sanity check, and it passed:
Troubleshooting — vLLM
ray head node
======== Autoscaler status: 2025-04-25 13:23:43.124555 ========
Node status
---------------------------------------------------------------
Active:
1 node_1331c5c771898a75caaaeed528c77d9a10132b3f6f57a46bd444980d
1 node_f9fee740f9505a910fa99a16799e10a4fad360251146be8297486b49
Pending:
(no pending nodes)
Recent failures:
(no failures)
Resources
---------------------------------------------------------------
Usage:
0.0/32.0 CPU
0.0/4.0 GPU
0B/30.14GiB memory
0B/12.92GiB object_store_memory
Demands:
(no resource demands)
NCCL_SOCKET_IFNAME=ens4 GLOO_SOCKET_IFNAME=ens4 NCCL_DEBUG=TRACE torchrun --nnodes 2 --nproc-per-node=2 --node-rank 0 --master_addr 172.21.151.99 test.py
ray worker node
NCCL_SOCKET_IFNAME=ens4 GLOO_SOCKET_IFNAME=ens4 NCCL_DEBUG=TRACE torchrun --nnodes 2 --nproc-per-node=2 --node-rank 1 --master_addr 172.21.151.99 test.py
- I ran the follow command to start the distribute Inference in ray worker node:
my script:
export NCCL_SOCKET_IFNAME=ens4
export GLOO_SOCKET_IFNAME=ens4
export TP_SOCKET_IFNAME=ens4
#export NCCL_DEBUG=INFO
export NCCL_IB_DISABLE=1
export NCCL_P2P_DISABLE=1
#export NCCL_P2P_LEVEL=SYS
export VLLM_HOST_IP=172.21.222.20
#export NCCL_DEBUG_SUBSYS=ALL
export NCCL_SHM_DISABLE=1
export NCCL_NET=Socket
export NCCL_DEBUG=TRACE
vllm serve /models/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B \
--served-model-name DeepSeek-R1-Distill-Qwen-14B \
--gpu-memory-utilization 0.95 \
--dtype half \
--tensor-parallel-size 2 \
--enforce-eager \
--trust-remote-code \
--max-model-len 2048 \
--enable-chunked-prefill \
--pipeline-parallel-size 2
- got the following errors:
(ray) root@ds-x86:~/ray# ./ray-vllm-14B.sh
INFO 04-25 05:15:11 [__init__.py:239] Automatically detected platform cuda.
INFO 04-25 05:15:13 [api_server.py:981] vLLM API server version 0.8.2
INFO 04-25 05:15:13 [api_server.py:982] args: Namespace(subparser='serve', model_tag='/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B', config='', host=None, port=8000, uvicorn_log_level='info', disable_uvicorn_access_log=False, allow_credentials=False, allowed_origins=['*'], allowed_methods=['*'], allowed_headers=['*'], api_key=None, lora_modules=None, prompt_adapters=None, chat_template=None, chat_template_content_format='auto', response_role='assistant', ssl_keyfile=None, ssl_certfile=None, ssl_ca_certs=None, enable_ssl_refresh=False, ssl_cert_reqs=0, root_path=None, middleware=[], return_tokens_as_token_ids=False, disable_frontend_multiprocessing=False, enable_request_id_headers=False, enable_auto_tool_choice=False, tool_call_parser=None, tool_parser_plugin='', model='/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B', task='auto', tokenizer=None, hf_config_path=None, skip_tokenizer_init=False, revision=None, code_revision=None, tokenizer_revision=None, tokenizer_mode='auto', trust_remote_code=True, allowed_local_media_path=None, download_dir=None, load_format='auto', config_format=<ConfigFormat.AUTO: 'auto'>, dtype='half', kv_cache_dtype='auto', max_model_len=2048, guided_decoding_backend='xgrammar', logits_processor_pattern=None, model_impl='auto', distributed_executor_backend=None, pipeline_parallel_size=2, tensor_parallel_size=2, enable_expert_parallel=False, max_parallel_loading_workers=None, ray_workers_use_nsight=False, block_size=None, enable_prefix_caching=None, disable_sliding_window=False, use_v2_block_manager=True, num_lookahead_slots=0, seed=None, swap_space=4, cpu_offload_gb=0, gpu_memory_utilization=0.95, num_gpu_blocks_override=None, max_num_batched_tokens=None, max_num_partial_prefills=1, max_long_partial_prefills=1, long_prefill_token_threshold=0, max_num_seqs=None, max_logprobs=20, disable_log_stats=False, quantization=None, rope_scaling=None, rope_theta=None, hf_overrides=None, enforce_eager=True, max_seq_len_to_capture=8192, disable_custom_all_reduce=False, tokenizer_pool_size=0, tokenizer_pool_type='ray', tokenizer_pool_extra_config=None, limit_mm_per_prompt=None, mm_processor_kwargs=None, disable_mm_preprocessor_cache=False, enable_lora=False, enable_lora_bias=False, max_loras=1, max_lora_rank=16, lora_extra_vocab_size=256, lora_dtype='auto', long_lora_scaling_factors=None, max_cpu_loras=None, fully_sharded_loras=False, enable_prompt_adapter=False, max_prompt_adapters=1, max_prompt_adapter_token=0, device='auto', num_scheduler_steps=1, use_tqdm_on_load=True, multi_step_stream_outputs=True, scheduler_delay_factor=0.0, enable_chunked_prefill=True, speculative_config=None, speculative_model=None, speculative_model_quantization=None, num_speculative_tokens=None, speculative_disable_mqa_scorer=False, speculative_draft_tensor_parallel_size=None, speculative_max_model_len=None, speculative_disable_by_batch_size=None, ngram_prompt_lookup_max=None, ngram_prompt_lookup_min=None, spec_decoding_acceptance_method='rejection_sampler', typical_acceptance_sampler_posterior_threshold=None, typical_acceptance_sampler_posterior_alpha=None, disable_logprobs_during_spec_decoding=None, model_loader_extra_config=None, ignore_patterns=[], preemption_mode=None, served_model_name=['DeepSeek-R1-Distill-Qwen-14B'], qlora_adapter_name_or_path=None, show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None, disable_async_output_proc=False, scheduling_policy='fcfs', scheduler_cls='vllm.core.scheduler.Scheduler', override_neuron_config=None, override_pooler_config=None, compilation_config=None, kv_transfer_config=None, worker_cls='auto', worker_extension_cls='', generation_config='auto', override_generation_config=None, enable_sleep_mode=False, calculate_kv_scales=False, additional_config=None, enable_reasoning=False, reasoning_parser=None, disable_cascade_attn=False, disable_log_requests=False, max_log_len=None, disable_fastapi_docs=False, enable_prompt_tokens_details=False, enable_server_load_tracking=False, dispatch_function=<function ServeSubcommand.cmd at 0x7dc10ca6fba0>)
WARNING 04-25 05:15:13 [config.py:2614] Casting torch.bfloat16 to torch.float16.
INFO 04-25 05:15:20 [config.py:585] This model supports multiple tasks: {'generate', 'score', 'embed', 'reward', 'classify'}. Defaulting to 'generate'.
WARNING 04-25 05:15:20 [arg_utils.py:1854] Compute Capability < 8.0 is not supported by the V1 Engine. Falling back to V0.
INFO 04-25 05:15:20 [config.py:1519] Defaulting to use ray for distributed inference
INFO 04-25 05:15:20 [config.py:1697] Chunked prefill is enabled with max_num_batched_tokens=2048.
INFO 04-25 05:15:20 [llm_engine.py:241] Initializing a V0 LLM engine (v0.8.2) with config: model='/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B', speculative_config=None, tokenizer='/models/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=2, pipeline_parallel_size=2, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=DeepSeek-R1-Distill-Qwen-14B, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=True, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={"splitting_ops":[],"compile_sizes":[],"cudagraph_capture_sizes":[],"max_capture_size":0}, use_cached_outputs=False,
2025-04-25 05:15:21,614 INFO worker.py:1660 -- Connecting to existing Ray cluster at address: 172.21.151.99:6379...
2025-04-25 05:15:21,627 INFO worker.py:1843 -- Connected to Ray cluster. View the dashboard at http://172.21.151.99:8265
INFO 04-25 05:15:21 [ray_utils.py:335] No current placement group found. Creating a new placement group.
INFO 04-25 05:15:22 [ray_distributed_executor.py:176] use_ray_spmd_worker: False
(pid=3614692) INFO 04-25 05:15:26 [__init__.py:239] Automatically detected platform cuda.
INFO 04-25 05:15:28 [ray_distributed_executor.py:352] non_carry_over_env_vars from config: set()
INFO 04-25 05:15:28 [ray_distributed_executor.py:354] Copying the following environment variables to workers: ['VLLM_WORKER_MULTIPROC_METHOD', 'VLLM_USE_V1']
INFO 04-25 05:15:28 [ray_distributed_executor.py:357] If certain env vars should NOT be copied to workers, add them to /root/.config/vllm/ray_non_carry_over_env_vars.json file
(RayWorkerWrapper pid=3614806) INFO 04-25 05:15:29 [cuda.py:239] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
(RayWorkerWrapper pid=3614806) INFO 04-25 05:15:29 [cuda.py:288] Using XFormers backend.
INFO 04-25 05:15:29 [cuda.py:239] Cannot use FlashAttention-2 backend for Volta and Turing GPUs.
INFO 04-25 05:15:29 [cuda.py:288] Using XFormers backend.
INFO 04-25 05:15:30 [utils.py:931] Found nccl from library libnccl.so.2
INFO 04-25 05:15:30 [pynccl.py:69] vLLM is using nccl==2.21.5
ds-x86:3614588:3614588 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens4
ds-x86:3614588:3614588 [0] NCCL INFO Bootstrap : Using ens4:172.21.222.20<0>
ds-x86:3614588:3614588 [0] NCCL INFO NET/Plugin: No plugin found (libnccl-net.so)
ds-x86:3614588:3614588 [0] NCCL INFO NET/Plugin: Plugin load returned 2 : libnccl-net.so: cannot open shared object file: No such file or directory : when loading libnccl-net.so
ds-x86:3614588:3614588 [0] NCCL INFO NET/Plugin: Using internal network plugin.
ds-x86:3614588:3614588 [0] NCCL INFO cudaDriverVersion 12080
NCCL version 2.21.5+cuda12.4
(RayWorkerWrapper pid=3614806) INFO 04-25 05:15:30 [utils.py:931] Found nccl from library libnccl.so.2
(RayWorkerWrapper pid=3614806) INFO 04-25 05:15:30 [pynccl.py:69] vLLM is using nccl==2.21.5
ds-x86:3614588:3614588 [0] NCCL INFO NCCL_IB_DISABLE set by environment to 1.
ds-x86:3614588:3614588 [0] NCCL INFO NCCL_SOCKET_IFNAME set by environment to ens4
ds-x86:3614588:3614588 [0] NCCL INFO NET/Socket : Using [0]ens4:172.21.222.20<0>
ds-x86:3614588:3614588 [0] NCCL INFO Using non-device net plugin version 0
ds-x86:3614588:3614588 [0] NCCL INFO Using network Socket
ds-x86:3614588:3614588 [0] NCCL INFO ncclCommInitRank comm 0x3f5280a0 rank 0 nranks 2 cudaDev 0 nvmlDev 0 busId a0 commId 0x9cd5c8fef97ef147 - Init START
ds-x86:3614588:3614588 [0] NCCL INFO NCCL_P2P_LEVEL set by environment to LOC
ds-x86:3614588:3614588 [0] NCCL INFO NCCL_SHM_DISABLE set by environment to 1.
ds-x86:3614588:3614588 [0] NCCL INFO comm 0x3f5280a0 rank 0 nRanks 2 nNodes 1 localRanks 2 localRank 0 MNNVL 0
ds-x86:3614588:3614588 [0] NCCL INFO Channel 00/02 : 0 0
ds-x86:3614588:3614588 [0] graph/rings.cc:51 NCCL WARN Error : ring 0 does not contain rank 1
ds-x86:3614588:3614588 [0] NCCL INFO graph/connect.cc:489 -> 3
ds-x86:3614588:3614588 [0] NCCL INFO init.cc:1210 -> 3
ds-x86:3614588:3614588 [0] NCCL INFO init.cc:1548 -> 3
ds-x86:3614588:3614588 [0] NCCL INFO init.cc:1799 -> 3
ds-x86:3614588:3614588 [0] NCCL INFO init.cc:1837 -> 3
ERROR 04-25 05:15:31 [worker_base.py:620] Error executing method 'init_device'. This might cause deadlock in distributed execution.
ERROR 04-25 05:15:31 [worker_base.py:620] Traceback (most recent call last):
ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 612, in execute_method
ERROR 04-25 05:15:31 [worker_base.py:620] return run_method(self, method, args, kwargs)
ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/utils.py", line 2255, in run_method
ERROR 04-25 05:15:31 [worker_base.py:620] return func(*args, **kwargs)
ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^
ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 604, in init_device
ERROR 04-25 05:15:31 [worker_base.py:620] self.worker.init_device() # type: ignore
ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/worker/worker.py", line 166, in init_device
ERROR 04-25 05:15:31 [worker_base.py:620] init_worker_distributed_environment(self.vllm_config, self.rank,
ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/worker/worker.py", line 507, in init_worker_distributed_environment
ERROR 04-25 05:15:31 [worker_base.py:620] ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 992, in ensure_model_parallel_initialized
ERROR 04-25 05:15:31 [worker_base.py:620] initialize_model_parallel(tensor_model_parallel_size,
ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 925, in initialize_model_parallel
ERROR 04-25 05:15:31 [worker_base.py:620] _TP = init_model_parallel_group(group_ranks,
ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 730, in init_model_parallel_group
ERROR 04-25 05:15:31 [worker_base.py:620] return GroupCoordinator(
ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^^
ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 207, in __init__
ERROR 04-25 05:15:31 [worker_base.py:620] self.device_communicator = device_comm_cls(
ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^
ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 39, in __init__
ERROR 04-25 05:15:31 [worker_base.py:620] self.pynccl_comm = PyNcclCommunicator(
ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^
ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 99, in __init__
ERROR 04-25 05:15:31 [worker_base.py:620] self.comm: ncclComm_t = self.nccl.ncclCommInitRank(
ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 277, in ncclCommInitRank
ERROR 04-25 05:15:31 [worker_base.py:620] self.NCCL_CHECK(self._funcs["ncclCommInitRank"](ctypes.byref(comm),
ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 256, in NCCL_CHECK
ERROR 04-25 05:15:31 [worker_base.py:620] raise RuntimeError(f"NCCL error: {error_str}")
ERROR 04-25 05:15:31 [worker_base.py:620] RuntimeError: NCCL error: internal error - please report this issue to the NCCL developers
[rank0]: Traceback (most recent call last):
[rank0]: File "/root/miniconda3/envs/ray/bin/vllm", line 8, in <module>
[rank0]: sys.exit(main())
[rank0]: ^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 75, in main
[rank0]: args.dispatch_function(args)
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/entrypoints/cli/serve.py", line 33, in cmd
[rank0]: uvloop.run(run_server(args))
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/uvloop/__init__.py", line 109, in run
[rank0]: return __asyncio.run(
[rank0]: ^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/asyncio/runners.py", line 195, in run
[rank0]: return runner.run(main)
[rank0]: ^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/asyncio/runners.py", line 118, in run
[rank0]: return self._loop.run_until_complete(task)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/uvloop/__init__.py", line 61, in wrapper
[rank0]: return await main
[rank0]: ^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 1016, in run_server
[rank0]: async with build_async_engine_client(args) as engine_client:
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/contextlib.py", line 210, in __aenter__
[rank0]: return await anext(self.gen)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 141, in build_async_engine_client
[rank0]: async with build_async_engine_client_from_engine_args(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/contextlib.py", line 210, in __aenter__
[rank0]: return await anext(self.gen)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 189, in build_async_engine_client_from_engine_args
[rank0]: engine_client = AsyncLLMEngine.from_vllm_config(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 650, in from_vllm_config
[rank0]: return cls(
[rank0]: ^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 605, in __init__
[rank0]: self.engine = self._engine_class(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/engine/async_llm_engine.py", line 267, in __init__
[rank0]: super().__init__(*args, **kwargs)
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 280, in __init__
[rank0]: self.model_executor = executor_class(vllm_config=vllm_config, )
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 271, in __init__
[rank0]: super().__init__(*args, **kwargs)
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 52, in __init__
[rank0]: self._init_executor()
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/executor/ray_distributed_executor.py", line 114, in _init_executor
[rank0]: self._init_workers_ray(placement_group)
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/executor/ray_distributed_executor.py", line 395, in _init_workers_ray
[rank0]: self._run_workers("init_device")
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/executor/ray_distributed_executor.py", line 516, in _run_workers
[rank0]: self.driver_worker.execute_method(sent_method, *args, **kwargs)
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 621, in execute_method
[rank0]: raise e
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 612, in execute_method
[rank0]: return run_method(self, method, args, kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/utils.py", line 2255, in run_method
[rank0]: return func(*args, **kwargs)
[rank0]: ^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 604, in init_device
[rank0]: self.worker.init_device() # type: ignore
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/worker/worker.py", line 166, in init_device
[rank0]: init_worker_distributed_environment(self.vllm_config, self.rank,
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/worker/worker.py", line 507, in init_worker_distributed_environment
[rank0]: ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 992, in ensure_model_parallel_initialized
[rank0]: initialize_model_parallel(tensor_model_parallel_size,
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 925, in initialize_model_parallel
[rank0]: _TP = init_model_parallel_group(group_ranks,
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 730, in init_model_parallel_group
[rank0]: return GroupCoordinator(
[rank0]: ^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 207, in __init__
[rank0]: self.device_communicator = device_comm_cls(
[rank0]: ^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 39, in __init__
[rank0]: self.pynccl_comm = PyNcclCommunicator(
[rank0]: ^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 99, in __init__
[rank0]: self.comm: ncclComm_t = self.nccl.ncclCommInitRank(
[rank0]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 277, in ncclCommInitRank
[rank0]: self.NCCL_CHECK(self._funcs["ncclCommInitRank"](ctypes.byref(comm),
[rank0]: File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 256, in NCCL_CHECK
[rank0]: raise RuntimeError(f"NCCL error: {error_str}")
[rank0]: RuntimeError: NCCL error: internal error - please report this issue to the NCCL developers
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] Error executing method 'init_device'. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] Traceback (most recent call last):
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 612, in execute_method
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] return run_method(self, method, args, kwargs)
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/utils.py", line 2255, in run_method
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] return func(*args, **kwargs)
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/ray/util/tracing/tracing_helper.py", line 463, in _resume_span
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] return method(self, *_args, **_kwargs)
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/worker/worker_base.py", line 604, in init_device
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] self.worker.init_device() # type: ignore
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/worker/worker.py", line 166, in init_device
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] init_worker_distributed_environment(self.vllm_config, self.rank,
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/worker/worker.py", line 507, in init_worker_distributed_environment
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] ensure_model_parallel_initialized(parallel_config.tensor_parallel_size,
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 992, in ensure_model_parallel_initialized
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] initialize_model_parallel(tensor_model_parallel_size,
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 925, in initialize_model_parallel
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] _TP = init_model_parallel_group(group_ranks,
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 730, in init_model_parallel_group
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] return GroupCoordinator(
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/parallel_state.py", line 207, in __init__
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] self.device_communicator = device_comm_cls(
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 39, in __init__
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] self.pynccl_comm = PyNcclCommunicator(
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl.py", line 99, in __init__
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] self.comm: ncclComm_t = self.nccl.ncclCommInitRank(
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 277, in ncclCommInitRank
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] self.NCCL_CHECK(self._funcs["ncclCommInitRank"](ctypes.byref(comm),
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] File "/root/miniconda3/envs/ray/lib/python3.12/site-packages/vllm/distributed/device_communicators/pynccl_wrapper.py", line 256, in NCCL_CHECK
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] raise RuntimeError(f"NCCL error: {error_str}")
(RayWorkerWrapper pid=3614806) ERROR 04-25 05:15:31 [worker_base.py:620] RuntimeError: NCCL error: internal error - please report this issue to the NCCL developers
(pid=443095, ip=172.21.151.99) INFO 04-25 13:15:27 [__init__.py:239] Automatically detected platform cuda. [repeated 3x across cluster] (Ray deduplicates logs by default. Set RAY_DEDUP_LOGS=0 to disable log deduplication, or see https://docs.ray.io/en/master/ray-observability/user-guides/configure-logging.html#log-deduplication for more options.)
(RayWorkerWrapper pid=443095, ip=172.21.151.99) INFO 04-25 13:15:29 [cuda.py:239] Cannot use FlashAttention-2 backend for Volta and Turing GPUs. [repeated 2x across cluster]
(RayWorkerWrapper pid=443095, ip=172.21.151.99) INFO 04-25 13:15:29 [cuda.py:288] Using XFormers backend. [repeated 2x across cluster]
(RayWorkerWrapper pid=443095, ip=172.21.151.99) INFO 04-25 13:15:30 [utils.py:931] Found nccl from library libnccl.so.2 [repeated 2x across cluster]
(RayWorkerWrapper pid=443095, ip=172.21.151.99) INFO 04-25 13:15:30 [pynccl.py:69] vLLM is using nccl==2.21.5 [repeated 2x across cluster]
INFO 04-25 05:15:32 [ray_distributed_executor.py:127] Shutting down Ray distributed executor. If you see error log from logging.cc regarding SIGTERM received, please ignore because this is the expected termination process in Ray.
[rank0]:[W425 05:15:32.229636086 ProcessGroupNCCL.cpp:1496] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
(ray) root@ds-x86:~/ray#