No HIP GPUs are available for VeRL

XueruiSu · March 26, 2025, 3:43am

I tested the latest VeRL cloned from github with the following command on the AMD cluster:

for data generation: python ~/verl/examples/data_preprocess/gsm8k.py for LLM training: bash ~/verl/examples/grpo_trainer/run_deepseek7b_llm.sh

I stilled get the error of No HIP GPUs are available. But when I am testing vllm and torch using python in the interactive window, they do not cause any problems. This is so strange! Below is the test log file:

aiscuser@node-0:/scratch/azureml/cr/j/33a5aa2996f244f3ada3ec3029cdb09b/exe/wd$ rocm-smi


============================================ ROCm System Management Interface ============================================
====================================================== Concise Info ======================================================
Device  Node  IDs              Temp        Power     Partitions          SCLK    MCLK    Fan  Perf  PwrCap  VRAM%  GPU%
              (DID,     GUID)  (Junction)  (Socket)  (Mem, Compute, ID)
==========================================================================================================================
0       2     0x74b5,   65402  37.0°C      150.0W    NPS1, N/A, 0        132Mhz  900Mhz  0%   auto  750.0W  0%     0%
1       3     0x74b5,   27175  37.0°C      151.0W    NPS1, N/A, 0        131Mhz  900Mhz  0%   auto  750.0W  0%     0%
2       4     0x74b5,   16561  36.0°C      153.0W    NPS1, N/A, 0        132Mhz  900Mhz  0%   auto  750.0W  0%     0%
3       5     0x74b5,   54764  35.0°C      148.0W    NPS1, N/A, 0        132Mhz  900Mhz  0%   auto  750.0W  0%     0%
4       6     0x74b5,   10760  36.0°C      147.0W    NPS1, N/A, 0        131Mhz  900Mhz  0%   auto  750.0W  0%     0%
5       7     0x74b5,   48981  36.0°C      146.0W    NPS1, N/A, 0        132Mhz  900Mhz  0%   auto  750.0W  0%     0%
6       8     0x74b5,   32548  37.0°C      152.0W    NPS1, N/A, 0        131Mhz  900Mhz  0%   auto  750.0W  0%     0%
7       9     0x74b5,   60025  38.0°C      150.0W    NPS1, N/A, 0        131Mhz  900Mhz  0%   auto  750.0W  0%     0%
==========================================================================================================================
================================================== End of ROCm SMI Log ===================================================
aiscuser@node-0:/scratch/azureml/cr/j/33a5aa2996f244f3ada3ec3029cdb09b/exe/wd$ cd ~
aiscuser@node-0:~$ ls
azureml_job_env.sh  hostfile  hostfile.mpich  samples  tmp.7kLs  tmp.lJNL  tmp.Q9IC  tmp.rHRS
aiscuser@node-0:~$ git clone https://github.com/volcengine/verl.git
Cloning into 'verl'...
remote: Enumerating objects: 4870, done.
remote: Counting objects: 100% (12/12), done.
remote: Compressing objects: 100% (11/11), done.
remote: Total 4870 (delta 1), reused 9 (delta 1), pack-reused 4858 (from 1)
Receiving objects: 100% (4870/4870), 3.05 MiB | 23.48 MiB/s, done.
Resolving deltas: 100% (3216/3216), done.
aiscuser@node-0:~$ ls
azureml_job_env.sh  hostfile  hostfile.mpich  samples  tmp.7kLs  tmp.lJNL  tmp.Q9IC  tmp.rHRS  verl
aiscuser@node-0:~$ cd verl
aiscuser@node-0:~/verl$ ls
docker  docs  examples  LICENSE  Notice.txt  patches  pyproject.toml  README.md  recipe  requirements.txt  scripts  setup.py  tests  verl
aiscuser@node-0:~/verl$ source activate
(base) aiscuser@node-0:~/verl$ python ./examples/data_preprocess/gsm8k.py
README.md: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.94k/7.94k [00:00<00:00, 66.5MB/s]
train-00000-of-00001.parquet: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2.31M/2.31M [00:00<00:00, 100MB/s]
test-00000-of-00001.parquet: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 419k/419k [00:00<00:00, 288MB/s]
Generating train split: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7473/7473 [00:00<00:00, 463374.39 examples/s]
Generating test split: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:00<00:00, 436403.48 examples/s]
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7473/7473 [00:00<00:00, 33317.85 examples/s]
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:00<00:00, 34151.61 examples/s]
Creating parquet from Arrow format: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 305.33ba/s]
Creating parquet from Arrow format: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 413.03ba/s]
(base) aiscuser@node-0:~/verl$ bash ./examples/grpo_trainer/run_deepseek7b_llm.sh
+ python3 -m verl.trainer.main_ppo algorithm.adv_estimator=grpo data.train_files=/home/aiscuser/data/gsm8k/train.parquet data.val_files=/home/aiscuser/data/gsm8k/test.parquet data.train_batch_size=1024 data.max_prompt_length=512 data.max_response_length=1024 data.filter_overlong_prompts=True data.truncation=error actor_rollout_ref.model.path=deepseek-ai/deepseek-llm-7b-chat actor_rollout_ref.actor.optim.lr=1e-6 actor_rollout_ref.model.use_remove_padding=True actor_rollout_ref.actor.ppo_mini_batch_size=256 actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=80 actor_rollout_ref.actor.use_kl_loss=True actor_rollout_ref.actor.kl_loss_coef=0.001 actor_rollout_ref.actor.kl_loss_type=low_var_kl actor_rollout_ref.model.enable_gradient_checkpointing=True actor_rollout_ref.actor.fsdp_config.param_offload=False actor_rollout_ref.actor.fsdp_config.optimizer_offload=False actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=160 actor_rollout_ref.rollout.tensor_model_parallel_size=2 actor_rollout_ref.rollout.name=vllm actor_rollout_ref.rollout.gpu_memory_utilization=0.6 actor_rollout_ref.rollout.n=5 actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=160 actor_rollout_ref.ref.fsdp_config.param_offload=True algorithm.kl_ctrl.kl_coef=0.001 trainer.critic_warmup=0 'trainer.logger=[console]' trainer.project_name=verl_grpo_example_gsm8k trainer.experiment_name=deepseek_llm_7b_function_rm trainer.n_gpus_per_node=8 trainer.nnodes=1 trainer.save_freq=-1 trainer.test_freq=5 trainer.total_epochs=15
2025-03-26 02:55:02,518 INFO worker.py:1843 -- Started a local Ray instance. View the dashboard at http://127.0.0.1:8265
(TaskRunner pid=23702) {'actor_rollout_ref': {'actor': {'checkpoint': {'contents': ['model',
(TaskRunner pid=23702)                                                              'hf_model',
(TaskRunner pid=23702)                                                              'optimizer',
(TaskRunner pid=23702)                                                              'extra']},
(TaskRunner pid=23702)                                  'clip_ratio': 0.2,
(TaskRunner pid=23702)                                  'entropy_coeff': 0.001,
(TaskRunner pid=23702)                                  'fsdp_config': {'fsdp_size': -1,
(TaskRunner pid=23702)                                                  'optimizer_offload': False,
(TaskRunner pid=23702)                                                  'param_offload': False,
(TaskRunner pid=23702)                                                  'wrap_policy': {'min_num_params': 0}},
(TaskRunner pid=23702)                                  'grad_clip': 1.0,
(TaskRunner pid=23702)                                  'kl_loss_coef': 0.001,
(TaskRunner pid=23702)                                  'kl_loss_type': 'low_var_kl',
(TaskRunner pid=23702)                                  'optim': {'lr': 1e-06,
(TaskRunner pid=23702)                                            'lr_warmup_steps': -1,
(TaskRunner pid=23702)                                            'lr_warmup_steps_ratio': 0.0,
...
(TaskRunner pid=23702)              'project_name': 'verl_grpo_example_gsm8k',
(TaskRunner pid=23702)              'remove_previous_ckpt_in_save': False,
(TaskRunner pid=23702)              'resume_from_path': False,
(TaskRunner pid=23702)              'resume_mode': 'auto',
(TaskRunner pid=23702)              'save_freq': -1,
(TaskRunner pid=23702)              'test_freq': 5,
(TaskRunner pid=23702)              'total_epochs': 15,
(TaskRunner pid=23702)              'total_training_steps': None,
(TaskRunner pid=23702)              'val_generations_to_log_to_wandb': 0}}
(TaskRunner pid=23702) WARNING 03-26 02:55:11 rocm.py:13] `fork` method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to `spawn` instead.
Error executing job with overrides: ['algorithm.adv_estimator=grpo', 'data.train_files=/home/aiscuser/data/gsm8k/train.parquet', 'data.val_files=/home/aiscuser/data/gsm8k/test.parquet', 'data.train_batch_size=1024', 'data.max_prompt_length=512', 'data.max_response_length=1024', 'data.filter_overlong_prompts=True', 'data.truncation=error', 'actor_rollout_ref.model.path=deepseek-ai/deepseek-llm-7b-chat', 'actor_rollout_ref.actor.optim.lr=1e-6', 'actor_rollout_ref.model.use_remove_padding=True', 'actor_rollout_ref.actor.ppo_mini_batch_size=256', 'actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=80', 'actor_rollout_ref.actor.use_kl_loss=True', 'actor_rollout_ref.actor.kl_loss_coef=0.001', 'actor_rollout_ref.actor.kl_loss_type=low_var_kl', 'actor_rollout_ref.model.enable_gradient_checkpointing=True', 'actor_rollout_ref.actor.fsdp_config.param_offload=False', 'actor_rollout_ref.actor.fsdp_config.optimizer_offload=False', 'actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=160', 'actor_rollout_ref.rollout.tensor_model_parallel_size=2', 'actor_rollout_ref.rollout.name=vllm', 'actor_rollout_ref.rollout.gpu_memory_utilization=0.6', 'actor_rollout_ref.rollout.n=5', 'actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=160', 'actor_rollout_ref.ref.fsdp_config.param_offload=True', 'algorithm.kl_ctrl.kl_coef=0.001', 'trainer.critic_warmup=0', 'trainer.logger=[console]', 'trainer.project_name=verl_grpo_example_gsm8k', 'trainer.experiment_name=deepseek_llm_7b_function_rm', 'trainer.n_gpus_per_node=8', 'trainer.nnodes=1', 'trainer.save_freq=-1', 'trainer.test_freq=5', 'trainer.total_epochs=15']
Traceback (most recent call last):
  File "/home/aiscuser/verl/verl/trainer/main_ppo.py", line 54, in main
    run_ppo(config)
  File "/home/aiscuser/verl/verl/trainer/main_ppo.py", line 72, in run_ppo
    ray.get(runner.run.remote(config))
  File "/opt/conda/lib/python3.11/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/ray/_private/worker.py", line 2782, in get
    values, debugger_breakpoint = worker.get_objects(object_refs, timeout=timeout)
                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/ray/_private/worker.py", line 929, in get_objects
    raise value.as_instanceof_cause()
ray.exceptions.RayTaskError(RuntimeError): ray::TaskRunner.run() (pid=23702, ip=100.65.3.152, actor_id=ef46086bf9037197bc3baabc01000000, repr=<main_ppo.TaskRunner object at 0x7f104f3f0f90>)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/aiscuser/verl/verl/trainer/main_ppo.py", line 97, in run
    from verl.workers.fsdp_workers import ActorRolloutRefWorker, CriticWorker
  File "/home/aiscuser/verl/verl/workers/fsdp_workers.py", line 41, in <module>
    from verl.workers.sharding_manager.fsdp_ulysses import FSDPUlyssesShardingManager
  File "/home/aiscuser/verl/verl/workers/sharding_manager/__init__.py", line 34, in <module>
    if is_vllm_available():
       ^^^^^^^^^^^^^^^^^^^
  File "/home/aiscuser/verl/verl/utils/import_utils.py", line 35, in is_vllm_available
    import vllm
  File "/vllm/vllm/__init__.py", line 3, in <module>
    from vllm.engine.arg_utils import AsyncEngineArgs, EngineArgs
  File "/vllm/vllm/engine/arg_utils.py", line 11, in <module>
    from vllm.config import (CacheConfig, ConfigFormat, DecodingConfig,
  File "/vllm/vllm/config.py", line 12, in <module>
    from vllm.model_executor.layers.quantization import QUANTIZATION_METHODS
  File "/vllm/vllm/model_executor/layers/quantization/__init__.py", line 10, in <module>
    from vllm.model_executor.layers.quantization.compressed_tensors.compressed_tensors import (  # noqa: E501
  File "/vllm/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors.py", line 11, in <module>
    from vllm.model_executor.layers.quantization.compressed_tensors.compressed_tensors_moe import (  # noqa: E501
  File "/vllm/vllm/model_executor/layers/quantization/compressed_tensors/compressed_tensors_moe.py", line 10, in <module>
    from vllm.model_executor.layers.quantization.compressed_tensors.schemes import (
  File "/vllm/vllm/model_executor/layers/quantization/compressed_tensors/schemes/__init__.py", line 4, in <module>
    from .compressed_tensors_w8a8_fp8 import CompressedTensorsW8A8Fp8
  File "/vllm/vllm/model_executor/layers/quantization/compressed_tensors/schemes/compressed_tensors_w8a8_fp8.py", line 10, in <module>
    from vllm.model_executor.layers.quantization.utils.w8a8_utils import (
  File "/vllm/vllm/model_executor/layers/quantization/utils/w8a8_utils.py", line 11, in <module>
    TORCH_DEVICE_IDENTITY = torch.ones(1).cuda() if is_hip() else None
                            ^^^^^^^^^^^^^^^^^^^^
  File "/opt/conda/lib/python3.11/site-packages/torch/cuda/__init__.py", line 372, in _lazy_init
    torch._C._cuda_init()
RuntimeError: No HIP GPUs are available

Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
(base) aiscuser@node-0:~/verl$ pip list
DEPRECATION: Loading egg at /opt/conda/lib/python3.11/site-packages/setuptools-78.0.2-py3.11.egg is deprecated. pip 25.1 will enforce this behaviour change. A possible replacement is to use pip for package installation. Discussion can be found at https://github.com/pypa/pip/issues/12330
Package                           Version                    Editable project location
--------------------------------- -------------------------- -------------------------
accelerate                        1.5.2
aiohappyeyeballs                  2.6.1
aiohttp                           3.11.14
aiohttp-cors                      0.8.0
aiosignal                         1.3.2
amdsmi                            25.1.0+8dc45db
annotated-types                   0.7.0
antlr4-python3-runtime            4.9.3
anyio                             4.9.0
archspec                          0.2.5
attrs                             25.3.0
autocommand                       2.2.2
awscli                            1.38.19
backports.tarfile                 1.2.0
boltons                           24.0.0
boto3                             1.37.19
botocore                          1.37.19
Brotli                            1.1.0
cachetools                        5.5.2
certifi                           2025.1.31
cffi                              1.17.1
charset-normalizer                3.4.1
click                             8.1.8
cloudpickle                       3.1.1
codetiming                        1.4.0
colorama                          0.4.6
colorful                          0.5.6
conda                             25.1.1
conda-libmamba-solver             25.3.0
conda-package-handling            2.4.0
conda_package_streaming           0.11.0
datasets                          3.4.1
dill                              0.3.8
diskcache                         5.6.3
distlib                           0.3.9
distro                            1.9.0
docker-pycreds                    0.4.0
docutils                          0.16
einops                            0.8.1
fastapi                           0.115.12
filelock                          3.16.1
flash_attn                        2.7.3
frozendict                        2.4.6
frozenlist                        1.5.0
fsspec                            2024.10.0
gguf                              0.10.0
gitdb                             4.0.12
GitPython                         3.1.44
google-api-core                   2.24.2
google-auth                       2.38.0
googleapis-common-protos          1.69.2
grpcio                            1.71.0
h11                               0.14.0
h2                                4.2.0
hiredis                           3.1.0
hpack                             4.1.0
httpcore                          1.0.7
httptools                         0.6.4
httpx                             0.28.1
huggingface-hub                   0.29.3
hydra-core                        1.3.2
hyperframe                        6.1.0
idna                              3.10
importlib_metadata                8.6.1
inflect                           7.3.1
iniconfig                         2.1.0
inquirerpy                        0.3.4
interegular                       0.3.3
jaraco.collections                5.1.0
jaraco.context                    5.3.0
jaraco.functools                  4.0.1
jaraco.text                       3.12.1
Jinja2                            3.1.4
jiter                             0.9.0
jmespath                          1.0.1
jsonpatch                         1.33
jsonpointer                       3.0.0
jsonschema                        4.23.0
jsonschema-specifications         2024.10.1
lark                              1.2.2
libmambapy                        2.0.8
libnacl                           2.1.0
liger_kernel                      0.5.5
llvmlite                          0.44.0
lm-format-enforcer                0.10.6
MarkupSafe                        2.1.5
menuinst                          2.2.0
mistral_common                    1.5.4
more-itertools                    10.3.0
mpmath                            1.3.0
msgpack                           1.1.0
msgspec                           0.19.0
multidict                         6.2.0
multiprocess                      0.70.16
nest-asyncio                      1.6.0
networkx                          3.4.2
numba                             0.61.0
numpy                             1.26.4
omegaconf                         2.3.0
openai                            1.68.2
opencensus                        0.11.4
opencensus-context                0.1.3
opencv-python-headless            4.11.0.86
orjson                            3.10.16
outlines                          0.0.46
packaging                         24.2
pandas                            2.2.3
partial-json-parser               0.2.1.1.post5
peft                              0.15.0
pfzy                              0.3.4
pillow                            11.0.0
pip                               25.0.1
platformdirs                      4.3.6
pluggy                            1.5.0
prometheus_client                 0.21.1
prometheus-fastapi-instrumentator 7.1.0
prompt_toolkit                    3.0.50
propcache                         0.3.0
proto-plus                        1.26.1
protobuf                          5.29.4
psutil                            7.0.0
py-cpuinfo                        9.0.0
py-spy                            0.4.0
pyairports                        2.1.1
pyarrow                           19.0.1
pyasn1                            0.6.1
pyasn1_modules                    0.4.1
pybind11                          2.13.6
pycosat                           0.6.6
pycountry                         24.6.1
pycparser                         2.22
pydantic                          2.10.6
pydantic_core                     2.27.2
pylatexenc                        2.10
PySocks                           1.7.1
pytest                            8.3.5
pytest-asyncio                    0.26.0
python-dateutil                   2.9.0.post0
python-dotenv                     1.1.0
pytorch-triton-rocm               3.3.0+git96316ce5
pytz                              2025.2
PyYAML                            6.0.2
pyzmq                             26.3.0
ray                               2.44.0
redis                             5.2.1
referencing                       0.36.2
regex                             2024.11.6
requests                          2.32.3
rpds-py                           0.23.1
rsa                               4.7.2
ruamel.yaml                       0.18.10
ruamel.yaml.clib                  0.2.8
s3transfer                        0.11.4
safetensors                       0.5.3
scipy                             1.15.2
sentencepiece                     0.2.0
sentry-sdk                        2.24.1
setproctitle                      1.3.5
setuptools                        75.8.2
setuptools                        78.0.2
setuptools-scm                    8.2.0
six                               1.17.0
smart-open                        7.1.0
smmap                             5.0.2
sniffio                           1.3.1
starlette                         0.46.1
supervisor                        4.2.5
sympy                             1.13.3
tensorboardX                      2.6.2.2
tensordict                        0.6.2
tensorizer                        2.9.2
tiktoken                          0.9.0
tokenizers                        0.21.1
tomli                             2.0.1
torch                             2.8.0.dev20250325+rocm6.3
torchaudio                        2.6.0.dev20250325+rocm6.3
torchdata                         0.11.0
torchvision                       0.22.0.dev20250325+rocm6.3
tqdm                              4.67.1
transformers                      4.50.1
triton                            3.2.0
truststore                        0.10.1
typeguard                         4.3.0
typing_extensions                 4.12.2
tzdata                            2025.2
urllib3                           2.3.0
uvicorn                           0.34.0
uvloop                            0.21.0
verl                              0.2.0.dev0                 /verl
virtualenv                        20.29.3
vllm                              0.6.3+rocm634              /vllm
wandb                             0.19.8
watchfiles                        1.0.4
wcwidth                           0.2.13
websockets                        15.0.1
wheel                             0.45.1
wrapt                             1.17.2
xxhash                            3.5.0
yarl                              1.18.3
zipp                              3.21.0
zstandard                         0.23.0
(base) aiscuser@node-0:~/verl$ python
Python 3.11.11 | packaged by conda-forge | (main, Mar  3 2025, 20:43:55) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.is_available()
True
>>> import vllm
WARNING 03-26 03:00:44 rocm.py:13] `fork` method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to `spawn` instead.
>>> torch.cuda.is_available()
True
>>> torch.ones(1).cuda()
tensor([1.], device='cuda:0')
>>>
(base) aiscuser@node-0:~/verl$

XueruiSu · March 26, 2025, 3:45am

Ask for help, wuwu. If you need more information, contact me with bi2030592079@163.com

youkaichao · March 26, 2025, 2:32pm

I think it might be related to [Bugfix][V1] Avoid importing PreTrainedModel by HollowMan6 · Pull Request #15366 · vllm-project/vllm · GitHub

hiyouga · March 31, 2025, 9:26am

Hi Xuerui, could you please provide the full traceback of the verl about this issue? No HIP GPUs are available

tjtanaa · April 4, 2025, 12:11pm

@XueruiSu
If you are using bare metal, there might be some permissions are not setup correctly where your current user is not able to use PyTorch to access the AMD GPUs.

If you are able to launch a docker image, please launch a docker image and try to run the following steps to verify if it is the case where your bare metal user is not setup with correct permission.

Example steps:

Launch a docker image

#!/bin/bash
# this is rocm 6.3 python 3.12 with torch._scaled_mm rowwise feature
docker run -it \
   --network=host \
   --group-add=video \
   --ipc=host \
   --cap-add=SYS_PTRACE \
   --security-opt seccomp=unconfined \
   --device /dev/kfd \
   --device /dev/dri \
   rocm/pytorch:rocm6.3.4_ubuntu24.04_py3.12_pytorch_release_2.4.0 \
   bash

Check if pytorch in python is able to detect the AMD GPUs.

import torch
print(torch.cuda.is_available())

Additional information about vLLM RLHF support on ROCm:
At the moment of creating this reply, the sleep mode on AMD has been enabled in this PR. [Core][AMD] Migrate fully transparent sleep mode to ROCm platform by HollowMan6 · Pull Request #12695 · vllm-project/vllm

There is a docker image built from this PR that you could use it as a base for your to setup to run VeRL ghcr.io/embeddedllm/vllm-rocm:v0.8.2-18ed313-sleep-mode-rocm634_py310.

I have validated using the following examples:

- git clone https://github.com/hiyouga/EasyR1.git
- cd EasyR1; python3 –m pip install –e .
- export ROCR_VISIBLE_DEVICES="0,1,2,3,4,5,6,7"
- export VLLM_USE_TRITON_FLASH_ATTN=0
- Login to your wandb else edit the examples/config.yaml to logger: ["console"]
- bash examples/qwen2_5_vl_7b_geo3k_grpo.sh

Topic		Replies	Views
How to setup amd gpu as default in dual stack gpu? AMD GPU Support	10	157	April 21, 2025
Why vllm cannot fully use GPU in batch processing General	12	282	March 29, 2025
How to build a VLLM python wheel can be used by other GPU types? General	2	80	March 21, 2025
Errors When Running VLLM + DeepSeek on RTX 5090 — Existing Solutions Not Working General	1	876	May 21, 2025
Free AMD GPU Access for the vllm developer General	2	66	June 20, 2025

No HIP GPUs are available for VeRL

Related topics