Measuring interactivity on vLLM

rajesh-s · March 12, 2026, 4:22am

How to calculate interactivity on a single-node inference using vLLM metrics?

Would it be throughput/num-seqs ?

RunLLM · March 12, 2026, 4:22am

It seems likely that “interactivity” in the context of vLLM refers to token-level latency, such as Time Per Output Token (TPOT) or Inter-Token Latency (ITL), rather than a direct formula like throughput/num-seqs. vLLM exposes metrics like ITL and TPOT via Prometheus and dashboards, which directly measure the average or percentile time between generated tokens—these are the standard metrics for interactivity in LLM serving, not throughput divided by batch size or num-seqs. You can monitor these using the vLLM dashboard or Prometheus queries for vllm:inter_token_latency_seconds_* metrics, as described in the documentation and dashboard panels (docs example).

Throughput/num-seqs does not directly yield interactivity, since throughput is tokens/sec (aggregate) and num-seqs is batch size; dividing them gives tokens/sec per sequence, not latency per token. For true interactivity, use the ITL or TPOT metrics provided by vLLM (dashboard reference). Would you like more detail on how to access or interpret these metrics?

Sources:

https://docs.vllm.ai/en/latest/examples/online_serving/dashboards/#__span-5-1072

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
vLLM服务开启了MTP，则如何评估指定维度的真实吞吐 Benchmarking	2	58	May 25, 2026
How to get the log for benchmarking Benchmarking	17	467	January 19, 2026
Suggestion to improve inferencing speed Model Support	17	779	March 11, 2026
请问我在使用vllm 推理qwen3-vl的时候多次请求的过程中比如发起5个请求后，发送的内容是一致的，前面4个请求相对较快返回，最后一个请求，也在running没有处于waiting中，但是等待的结果却比他们慢了很多，比如前面四个可能40s能返回，最后一个需要6分钟 General	1	90	December 11, 2025
what is the recommended method to start up the vllm server engine for inferencing for InternVL3_5-8B, getting 2 qps? General	1	132	September 14, 2025

Measuring interactivity on vLLM

Related topics