How to get `http_*` metrics as this doc suggests are available?

Does vllm emit these http_* metrics as the code snippet here shows?

$ curl http://0.0.0.0:8000/metrics 2>/dev/null  | grep -P '^http_(?!.*(_bucket|_created|_sum)).*'
http_requests_total{handler="/v1/completions",method="POST",status="2xx"} 201.0
http_request_size_bytes_count{handler="/v1/completions"} 201.0
http_response_size_bytes_count{handler="/v1/completions"} 201.0
http_request_duration_highr_seconds_count 201.0
http_request_duration_seconds_count{handler="/v1/completions",method="POST"} 201.0

I deployed the latest 0.8.4 vllm image from upstream as a K8s Deployment (see below). I then kubectl execed into the container and ran vllm serve TinyLlama/TinyLlama-1.1B-Chat-v1.0. Then in a separate shell, I ran curl --silent localhost:8000/metrics | grep http both before and after this curl.

$ curl localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {
        "role": "user",
        "content": "What is artificial intelligence?"
      }
    ]
  }'

I don’t see anything from curl --silent localhost:8000/metrics | grep http both before and after the inference request. Is the doc out-of-date? Am I missing something?

apiVersion: apps/v1
kind: Deployment
metadata:
  name: dxia-test
spec:
  selector:
    matchLabels:
      app: dxia-test
  template:
    metadata:
      labels:
        app: dxia-test
    spec:
      containers:
        - name: vllm
          image: vllm/vllm-openai:v0.8.4@sha256:b168cbb0101f51b2491047345d0a83f5a8ecbe56a6604f2a2edb81eca55ebc9e
          command: [ "/bin/bash", "-c", "--" ]
          args: [ "while true; do sleep 30; done;" ]
          ports:
            - name: http
              containerPort: 8000
              protocol: TCP
          resources:
            requests:
              cpu: 4
              memory: 8G
              ephemeral-storage: 10G
              nvidia.com/gpu: 1
            limits:
              cpu: 4
              memory: 8G
              ephemeral-storage: 10G
              nvidia.com/gpu: 1

Update

It’s strange, when I checkout git tag v0.8.4 from the main branch, the metrics are there. Maybe the image is built from a different point in source code than the corresponding tag?

On the main branch I bisected the change in behavior to commit 340d7b1b2. The commit before it 1bcbcbf57 has http_* metrics. 340d7b1b2 is currently released in 0.8.5.

The http_* metrics are indeed emitted by vLLM, as shown in the code snippet you provided. These metrics are exposed via the /metrics endpoint and include metrics like http_requests_total, http_request_size_bytes_count, and http_response_size_bytes_count among others. However, if you are not seeing these metrics, it might be due to the configuration or the way the server is set up. According to this GitHub pull request, the prometheus_fastapi_instrumentator is used to track HTTP metrics, and it should be properly configured to expose these metrics.

If you are not seeing the expected metrics, ensure that the prometheus_fastapi_instrumentator is correctly set up in your deployment. Additionally, verify that the vLLM server is running and accessible at the specified endpoint. If the issue persists, it might be helpful to check the server logs for any errors or misconfigurations. Would you like more detailed guidance on setting up the metrics?

Sources: