The service carrying capacity and computing efficiency issues of vllm

Hi, please tell me the maximum QPS that the bge-m3 model can achieve when deployed with the vllm framework and an H20 GPU, with an input token length of 4096?