My goal is seeing prompt_tokens_details.cached_tokens at the end of a response. VLLM shows null, but I found the --enable-prompt-tokens-details in the doc, so I tried that.
vllm serve /workspace/model \
--enforce-eager \
--enable-prefix-caching \
--host 0.0.0.0 \
--port 8000 \
--gpu-memory-utilization 0.8 \
--enable-prompt-tokens-details
Despite this, the response still has "prompt_tokens_details": null
Is this a bug? If so, where do I report it?
EDIT: I am running vllm 0.22.1rc1.dev32+gde2186341.d20260601 , built only an hour ago. I was previously running a 1-month-old version, but updated VLLM to be certain I could reproduce it in latest codebase before creating this topic. So if this is a bug, this isn’t some recent regression.