VLLM Engine Metrics

Yes, your code correctly enables vLLM metrics, including the histograms vllm:time_to_first_token_seconds, vllm:time_per_output_token_seconds, and vllm:e2e_request_latency_seconds, by instantiating both LoggingStatLogger and PrometheusStatLogger in the engine. These loggers will collect and expose the metrics you listed, as described in the metrics documentation and metrics API reference.

Just ensure the PrometheusStatLogger is active and the /metrics endpoint is exposed by your server for Prometheus to scrape the data.

Would you like more detail on how to verify these metrics are being collected?

Sources: