Hi,
Running vllm on ubuntu 24.04 rocm 7.1.1 and 7900 XTX
docker run -it --dns=192.168.49.1 --network=host --group-add=video --ipc=host --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device /dev/kfd --device /dev/dri --shm-size=16g -e HUGGING_FACE_HUB_TOKEN=“” -e VLLM_SLEEP_WHEN_IDLE=1 -e ROCM_VISIBLE_DEVICES=0,1 -e HIP_VISIBLE_DEVICES=0,1 -e HSA_OVERRIDE_GFX_VERSION=11.0.0 -e PYTORCH_ROCM_ARCH=“gfx1100” -e VLLM_USE_TRITON_FLASH_ATTN=0 -e GPU_MAX_HW_QUEUES=1 -e NCCL_DEBUG=WARN -e NCCL_IB_DISABLE=1 --restart unless-stopped --name vllm_rocm_gemma-3-27b-it-qat-W4A16-G128_tp1 -v /home/ubuntu/vllm_models:/root/.cache/huggingface rocm/vllm:rocm7.0.0_vllm_0.10.2_20251006 vllm serve leon-se/gemma-3-27b-it-qat-W4A16-G128 --host 0.0.0.0 --port 8000 --enforce-eager --served-model-name vllm/gemma-3 --tensor-parallel-size 1 --trust-remote-code --dtype bfloat16 --kv-cache-dtype auto --max-model-len 4096 --max-num-seqs 4 --max-num-batched-tokens 4096 --gpu-memory-utilization 0.92 --swap-space 12 --disable-log-requests --disable-log-stats --max-log-len 100
During inference bench its very slow and gives these:
Is this model compatible with 7900 XTx?
(EngineCore_DP0 pid=127) [rank0]:W1128 23:47:07.358000 127 torch/_dynamo/convert_frame.py:1097] [0/8] torch._dynamo hit config.recompile_limit (8)
(EngineCore_DP0 pid=127) [rank0]:W1128 23:47:07.358000 127 torch/_dynamo/convert_frame.py:1097] [0/8] function: ‘forward_static’ (/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/layernorm.py:372)
(EngineCore_DP0 pid=127) [rank0]:W1128 23:47:07.358000 127 torch/_dynamo/convert_frame.py:1097] [0/8] last reason: 0/7: expected type of ‘residual’ to be a tensor type, ’ but found <class ‘NoneType’>
(EngineCore_DP0 pid=127) [rank0]:W1128 23:47:07.358000 127 torch/_dynamo/convert_frame.py:1097] [0/8] To log all recompilation reasons, use TORCH_LOGS=“recompiles”.
(EngineCore_DP0 pid=127) [rank0]:W1128 23:47:07.358000 127 torch/_dynamo/convert_frame.py:1097] [0/8] To diagnose recompilation issues, see torch.compile Troubleshooting — PyTorch main documentation
(APIServer pid=1) INFO: 127.0.0.1:48138 - “POST /v1/completions HTTP/1.1” 200 OK