Speeding up vllm inference for Qwen2.5-VL

Will vllm work on a machine with Cuda 12.2 (driver 5.3.5) with 6 Tesla T4 GPU

vllm serve Qwen/Qwen2.5-VL-7B-Instruct --tensor-parallel-size 4 --enable-chunked-prefill --enable-prefix-caching --host 0.0.0.0 --port 6161 --dtype half