Does vllm emit these http_*
metrics as the code snippet here shows?
$ curl http://0.0.0.0:8000/metrics 2>/dev/null | grep -P '^http_(?!.*(_bucket|_created|_sum)).*'
http_requests_total{handler="/v1/completions",method="POST",status="2xx"} 201.0
http_request_size_bytes_count{handler="/v1/completions"} 201.0
http_response_size_bytes_count{handler="/v1/completions"} 201.0
http_request_duration_highr_seconds_count 201.0
http_request_duration_seconds_count{handler="/v1/completions",method="POST"} 201.0
I deployed the latest 0.8.4 vllm image from upstream as a K8s Deployment (see below). I then kubectl exec
ed into the container and ran vllm serve TinyLlama/TinyLlama-1.1B-Chat-v1.0
. Then in a separate shell, I ran curl --silent localhost:8000/metrics | grep http
both before and after this curl
.
$ curl localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"messages": [
{
"role": "user",
"content": "What is artificial intelligence?"
}
]
}'
I don’t see anything from curl --silent localhost:8000/metrics | grep http
both before and after the inference request. Is the doc out-of-date? Am I missing something?
apiVersion: apps/v1
kind: Deployment
metadata:
name: dxia-test
spec:
selector:
matchLabels:
app: dxia-test
template:
metadata:
labels:
app: dxia-test
spec:
containers:
- name: vllm
image: vllm/vllm-openai:v0.8.4@sha256:b168cbb0101f51b2491047345d0a83f5a8ecbe56a6604f2a2edb81eca55ebc9e
command: [ "/bin/bash", "-c", "--" ]
args: [ "while true; do sleep 30; done;" ]
ports:
- name: http
containerPort: 8000
protocol: TCP
resources:
requests:
cpu: 4
memory: 8G
ephemeral-storage: 10G
nvidia.com/gpu: 1
limits:
cpu: 4
memory: 8G
ephemeral-storage: 10G
nvidia.com/gpu: 1
Update
It’s strange, when I checkout git tag v0.8.4
from the main branch, the metrics are there. Maybe the image is built from a different point in source code than the corresponding tag?
On the main branch I bisected the change in behavior to commit 340d7b1b2
. The commit before it 1bcbcbf57
has http_*
metrics. 340d7b1b2
is currently released in 0.8.5.