How to calculate interactivity on a single-node inference using vLLM metrics?
Would it be throughput/num-seqs ?
How to calculate interactivity on a single-node inference using vLLM metrics?
Would it be throughput/num-seqs ?
It seems likely that “interactivity” in the context of vLLM refers to token-level latency, such as Time Per Output Token (TPOT) or Inter-Token Latency (ITL), rather than a direct formula like throughput/num-seqs. vLLM exposes metrics like ITL and TPOT via Prometheus and dashboards, which directly measure the average or percentile time between generated tokens—these are the standard metrics for interactivity in LLM serving, not throughput divided by batch size or num-seqs. You can monitor these using the vLLM dashboard or Prometheus queries for vllm:inter_token_latency_seconds_* metrics, as described in the documentation and dashboard panels (docs example).
Throughput/num-seqs does not directly yield interactivity, since throughput is tokens/sec (aggregate) and num-seqs is batch size; dividing them gives tokens/sec per sequence, not latency per token. For true interactivity, use the ITL or TPOT metrics provided by vLLM (dashboard reference). Would you like more detail on how to access or interpret these metrics?
Sources:
Hint: Mention @RunLLM in the post for followups.