Two H20-3e Nvidia
Cuda 12.6 (Nvidia Driver 560 Ubuntu 22.04)
vllm version 0.9.1 latest
model : Qwen 2.5 VL 72B instruct
vllm when interface chating generates non-stop characters like โโ,โโ in the end.
Not everytime, the frequency is not low.
Does anyone meet the same problem ? How to solve this strange problem?
Yes, this issue has been reported by other users running vLLM with Qwen 2.5 VL 72B on H20-3e Nvidia cards, where chat outputs sometimes contain abnormal or garbled characters and may stop prematurely. The problem is not unique to your setup and has been discussed in the community, but there is no definitive solution documented in the official troubleshooting or FAQ as of now. Some users have noted that updating CUDA/cuBLAS libraries (e.g., ensuring only one cublas version >12.3 and installing nvidia-cublas-cu12==12.4.5.8) resolved similar issues on H20 cards, though the root cause is not fully explained and may relate to library or driver compatibility rather than vLLM itself.
If you havenโt already, try checking your CUDA/cuBLAS versions and update as suggested. Also, ensure you are not using safetensors with vLLM for this model, as it may cause issues. If the problem persists, consider searching or posting on the vLLM GitHub issues for further support, as this is a known but unresolved issue for some hardware/model combinations. Would you like more detailed troubleshooting steps or example commands for updating your environment?
Sources:
Thank you for the update. If vLLM 0.9.1 is working well with Qwen 2.5 VL 72B on H20-3e and you are no longer experiencing issues, thatโs great to hear. No further action is needed.
If you encounter any new issues or have questions about vLLM features or troubleshooting, feel free to ask.
Sources: