Comparsion with omniserve(Lserve, Qserve)

sisrfeng · April 14, 2025, 8:48am

We did not explicitly compare with vLLM because we believe its performance is worse than TRT-LLM-FP16 (which implements the same paged attention functionality but with a faster attention kernel). Our throughput is much better than TRT-LLM-FP16.

Topic		Replies	Views
VLLM and Tensor RT General	3	272	November 4, 2025
The service carrying capacity and computing efficiency issues of vllm General	0	39	July 16, 2025
V1 has lower end-to-end performance than V0(--num-scheduler-steps=8) V1 Feedback	6	502	June 12, 2025
Seqence Parallelism Support - Source Code Location Features	0	44	June 10, 2025
V1 has lower end-to-end performance than V0(--num-scheduler-steps=8) General	1	230	June 11, 2025

Comparsion with omniserve(Lserve, Qserve)

Related topics