Hi, please tell me the maximum QPS that the bge-m3 model can achieve when deployed with the vllm framework and an H20 GPU, with an input token length of 4096?
Hi, please tell me the maximum QPS that the bge-m3 model can achieve when deployed with the vllm framework and an H20 GPU, with an input token length of 4096?