How to get `http_*` metrics as this doc suggests are available?

dxia · April 28, 2025, 5:41pm

Does vllm emit these http_* metrics as the code snippet here shows?

$ curl http://0.0.0.0:8000/metrics 2>/dev/null  | grep -P '^http_(?!.*(_bucket|_created|_sum)).*'
http_requests_total{handler="/v1/completions",method="POST",status="2xx"} 201.0
http_request_size_bytes_count{handler="/v1/completions"} 201.0
http_response_size_bytes_count{handler="/v1/completions"} 201.0
http_request_duration_highr_seconds_count 201.0
http_request_duration_seconds_count{handler="/v1/completions",method="POST"} 201.0

I deployed the latest 0.8.4 vllm image from upstream as a K8s Deployment (see below). I then kubectl execed into the container and ran vllm serve TinyLlama/TinyLlama-1.1B-Chat-v1.0. Then in a separate shell, I ran curl --silent localhost:8000/metrics | grep http both before and after this curl.

$ curl localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "What is artificial intelligence?"
      }
    ]
  }'

I don’t see anything from curl --silent localhost:8000/metrics | grep http both before and after the inference request. Is the doc out-of-date? Am I missing something?

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dxia-test
spec:
  selector:
    matchLabels:
      app: dxia-test
  template:
    metadata:
      labels:
        app: dxia-test
    spec:
      containers:
        - name: vllm
          image: vllm/vllm-openai:v0.8.4@sha256:b168cbb0101f51b2491047345d0a83f5a8ecbe56a6604f2a2edb81eca55ebc9e
          command: [ "/bin/bash", "-c", "--" ]
          args: [ "while true; do sleep 30; done;" ]
          ports:
            - name: http
              containerPort: 8000
              protocol: TCP
          resources:
            requests:
              cpu: 4
              memory: 8G
              ephemeral-storage: 10G
              nvidia.com/gpu: 1
            limits:
              cpu: 4
              memory: 8G
              ephemeral-storage: 10G
              nvidia.com/gpu: 1

Update

It’s strange, when I checkout git tag v0.8.4 from the main branch, the metrics are there. Maybe the image is built from a different point in source code than the corresponding tag?

On the main branch I bisected the change in behavior to commit 340d7b1b2. The commit before it 1bcbcbf57 has http_* metrics. 340d7b1b2 is currently released in 0.8.5.

RunLLM · April 28, 2025, 5:41pm

The http_* metrics are indeed emitted by vLLM, as shown in the code snippet you provided. These metrics are exposed via the /metrics endpoint and include metrics like http_requests_total, http_request_size_bytes_count, and http_response_size_bytes_count among others. However, if you are not seeing these metrics, it might be due to the configuration or the way the server is set up. According to this GitHub pull request, the prometheus_fastapi_instrumentator is used to track HTTP metrics, and it should be properly configured to expose these metrics.

If you are not seeing the expected metrics, ensure that the prometheus_fastapi_instrumentator is correctly set up in your deployment. Additionally, verify that the vLLM server is running and accessible at the specified endpoint. If the issue persists, it might be helpful to check the server logs for any errors or misconfigurations. Would you like more detailed guidance on setting up the metrics?

Sources:

GitHub Pull Request #15894

Topic		Replies	Views
VLLM Engine Metrics Benchmarking	20	39	June 11, 2025
Aborted request during benchmarking (H200-Llama 3.3 70B) General	1	4	July 2, 2025
Vllm v1 在哪个地方判读结束符或者停止输出的？ General	1	14	June 21, 2025
vLLM output vs Ollama General	8	257	April 10, 2025
How to obtain the logits of LLM General	21	67	June 19, 2025

How to get `http_*` metrics as this doc suggests are available?

Related topics