(Possible bug) --enable-prompt-tokens-details not working?

tran · June 2, 2026, 12:54am

My goal is seeing prompt_tokens_details.cached_tokens at the end of a response. VLLM shows null, but I found the --enable-prompt-tokens-details in the doc, so I tried that.


vllm serve /workspace/model \
--enforce-eager \ 
--enable-prefix-caching  \
--host 0.0.0.0  \
--port 8000  \
--gpu-memory-utilization 0.8  \
--enable-prompt-tokens-details

Despite this, the response still has "prompt_tokens_details": null

Is this a bug? If so, where do I report it?

EDIT: I am running vllm 0.22.1rc1.dev32+gde2186341.d20260601 , built only an hour ago. I was previously running a 1-month-old version, but updated VLLM to be certain I could reproduce it in latest codebase before creating this topic. So if this is a bug, this isn’t some recent regression.

RunLLM · June 2, 2026, 12:54am

Yes, this is a known bug. Even with --enable-prompt-tokens-details, the response’s prompt_tokens_details field is often null in recent vLLM versions. This issue has been reported and tracked in multiple places, including this GitHub issue and another related report. The maintainers have acknowledged the bug and are working on fixes.

You should report such bugs on the vLLM GitHub issues page, providing your environment details and command line. Would you like more detail on workarounds or the current status of the fix?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Text generation doesn't stop General	1	384	April 8, 2025
Tell me about the current status of the tokenize endpoint in vllm General	4	401	December 8, 2025
How to view the original request in v1 General	0	81	October 26, 2025
Prompt_embeds usage in vllm openai completion api Multi-modality	4	220	June 17, 2025
LLM memory caching General	7	150	May 7, 2026

(Possible bug) --enable-prompt-tokens-details not working?

Related topics