How to obtain the logprob of a specified token in a step?

If the top token’s probability is always near 1.0, it means the model is extremely confident in its prediction, assigning almost all probability mass to a single token. This can happen if the prompt context is highly deterministic or if the model’s logits are very peaked. However, with temperature=1.0, you would typically expect more distributed probabilities unless the model output is inherently certain. This is not a bug in vLLM; it’s a reflection of the model’s output distribution for your specific prompt and context, as discussed in issue #2613.

Would you like more detail or help diagnosing your prompt/context?

Sources:

Hint: Mention @RunLLM in the post for followups.