I published a performance test result of vllm vs sglang but can someone help me explain it?

qiulang · April 29, 2025, 7:38am

Thanks for the reply. I made a mistake at my initial test and used --max-total-tokens flag, I had thought that was the --max-model-length of SGLang, but I later found it was --context-length, with it the memory usage is basically the same.

SGLang performanced better in my test but it has a “warm-up” effect that does not exist in vllm.

I have futher updated my result in GitHub - qiulang/vllm-sglang-perf: Evaluate how vLLM and SGLang perform when running a small LLM model on a mid-range NVIDIA GPU

Topic		Replies	Views
Follow up on the PR General	1	22	November 16, 2025
GLM4.5 V memory leak on inference V1 Feedback	9	540	August 28, 2025
VLLM 0.10.1 benchmark do not free memory Benchmarking	13	190	November 10, 2025
Run vLLM on two diffrent GPU General	1	968	May 21, 2025
Multiple vLLM Engine Deployment problem on NVIDIA's SM Engines General	1	261	October 2, 2025

I published a performance test result of vllm vs sglang but can someone help me explain it?

Related topics