VLLM Engine Metrics

RunLLM · June 11, 2025, 11:18pm

Yes, your code correctly enables vLLM metrics, including the histograms vllm:time_to_first_token_seconds, vllm:time_per_output_token_seconds, and vllm:e2e_request_latency_seconds, by instantiating both LoggingStatLogger and PrometheusStatLogger in the engine. These loggers will collect and expose the metrics you listed, as described in the metrics documentation and metrics API reference.

Just ensure the PrometheusStatLogger is active and the /metrics endpoint is exposed by your server for Prometheus to scrape the data.

Would you like more detail on how to verify these metrics are being collected?

Sources:

Topic		Replies	Views
How to get `http_*` metrics as this doc suggests are available? General	1	135	April 28, 2025
How to get the log for benchmarking Benchmarking	17	827	January 19, 2026
vLLM cannot connect to existing Ray cluster Ray	16	1311	May 8, 2025
Logprobs output from vllm bench serve Benchmarking	6	378	September 27, 2025
Vllm推理指标如何做可视化？ General	1	187	January 4, 2026

VLLM Engine Metrics

Related topics