Thanks for the reply. I made a mistake at my initial test and used --max-total-tokens flag, I had thought that was the --max-model-length of SGLang, but I later found it was --context-length, with it the memory usage is basically the same.
SGLang performanced better in my test but it has a “warm-up” effect that does not exist in vllm.
I have futher updated my result in GitHub - qiulang/vllm-sglang-perf: Evaluate how vLLM and SGLang perform when running a small LLM model on a mid-range NVIDIA GPU