(Possible bug) --enable-prompt-tokens-details not working?

My goal is seeing prompt_tokens_details.cached_tokens at the end of a response. VLLM shows null, but I found the --enable-prompt-tokens-details in the doc, so I tried that.


vllm serve /workspace/model \
--enforce-eager \ 
--enable-prefix-caching  \
--host 0.0.0.0  \
--port 8000  \
--gpu-memory-utilization 0.8  \
--enable-prompt-tokens-details

Despite this, the response still has "prompt_tokens_details": null

Is this a bug? If so, where do I report it?

EDIT: I am running vllm 0.22.1rc1.dev32+gde2186341.d20260601 , built only an hour ago. I was previously running a 1-month-old version, but updated VLLM to be certain I could reproduce it in latest codebase before creating this topic. So if this is a bug, this isn’t some recent regression.

Yes, this is a known bug. Even with --enable-prompt-tokens-details, the response’s prompt_tokens_details field is often null in recent vLLM versions. This issue has been reported and tracked in multiple places, including this GitHub issue and another related report. The maintainers have acknowledged the bug and are working on fixes.

You should report such bugs on the vLLM GitHub issues page, providing your environment details and command line. Would you like more detail on workarounds or the current status of the fix?

Sources:

Hint: Mention @RunLLM in the post for followups.