I published a performance test result of vllm vs sglang but can someone help me explain it?

Thanks for the reply. I made a mistake at my initial test and used --max-total-tokens flag, I had thought that was the --max-model-length of SGLang, but I later found it was --context-length, with it the memory usage is basically the same.

SGLang performanced better in my test but it has a “warm-up” effect that does not exist in vllm.

I have futher updated my result in GitHub - qiulang/vllm-sglang-perf: Evaluate how vLLM and SGLang perform when running a small LLM model on a mid-range NVIDIA GPU