I want to know how to obtain the logprob of a specified token in a generation step in vllm.
To obtain the log probability (logprob) of a specified token during generation in vLLM, set the logprobs parameter in SamplingParams to the desired number of top logprobs to return. The output will include a list of dictionaries mapping token IDs to their logprobs for each generated token. You can then look up the logprob for your specified token in these dictionaries. For prompt tokens, use the prompt_logprobs parameter similarly. See test_logprobs.py and SamplingParams usage.
Example:
from vllm import LLM, SamplingParams
llm = LLM(model="facebook/opt-125m")
params = SamplingParams(logprobs=5) # top 5 logprobs per token
outputs = llm.generate(["Your prompt here"], params)
logprobs_dict = outputs[0].outputs[0].logprobs # list of dicts, one per token
# To get logprob of token_id X at step i:
logprob = logprobs_dict[i].get(X)
Would you like more detail?
Sources:
How can i obtain the logprob of the token if the specific token is not in top logprobs?
If the specific token is not in the top-k logprobs returned by vLLM, its logprob will not be present in the output. vLLM does not natively return logprobs for tokens outside the top-k; you must set logprobs to a large enough value to include your token, or, if the token is still not present, assign it a default low value (e.g., -100 or -9999.0) as a workaround. This is a known limitation, as discussed in issue #2613 and issue #3779.
Would you like more detail or code examples?
Sources:
What is the maximum number of logprobs that a vllm can output.
The maximum number of logprobs vLLM can output is controlled by the max_logprobs parameter when initializing the LLM class. By default, this is set to 20, but you can increase it as needed (e.g., up to the vocabulary size) by specifying max_logprobs in the LLM constructor. The value in SamplingParams.logprobs cannot exceed this limit, or a ValueError will be raised. See ModelConfig.max_logprobs docs and issue #5299.
Would you like more detail?
Sources:
How much does the number of output logprobs affect the generation efficiency of vllm?
Increasing the number of output logprobs significantly degrades vLLM generation efficiency, with runtime scaling nearly O(n) as logprobs increases. This is because retrieving more logprobs is computationally expensive, not just a simple slice from precomputed logits. For large values (e.g., logprobs near vocab size), generation can be much slower—see issue #14300 for benchmarks and discussion.
Would you like more detail or performance data?
Sources:
"logprobs": \[
{
"236840": 0.9998205030304157,
"17576": 0.00017952801482389944,
"329": 2.8491101918989194e-11,
"16": 0.0,
"0": 0.0,
"15": 0.0,
"13": 0.0,
"14": 0.0,
"10": 0.0,
"9": 0.0,
"11": 0.0,
"6": 0.0,
"12": 0.0,
"3": 0.0,
"1": 0.0,
"2": 0.0,
"8": 0.0,
"7": 0.0,
"5": 0.0,
"4": 0.0
},
why is the top token about 1.0 every time? i set temperature to 1.0.
"logprobs": \[
{
"236840": 0.9998205030304157,
"17576": 0.00017952801482389944,
"329": 2.8491101918989194e-11,
"16": 0.0,
"0": 0.0,
"15": 0.0,
"13": 0.0,
"14": 0.0,
"10": 0.0,
"9": 0.0,
"11": 0.0,
"6": 0.0,
"12": 0.0,
"3": 0.0,
"1": 0.0,
"2": 0.0,
"8": 0.0,
"7": 0.0,
"5": 0.0,
"4": 0.0
},
why is the top token about 1.0 every time? i set temperature to 1.0.
If the top token’s probability is always near 1.0, it means the model is extremely confident in its prediction, assigning almost all probability mass to a single token. This can happen if the prompt context is highly deterministic or if the model’s logits are very peaked. However, with temperature=1.0, you would typically expect more distributed probabilities unless the model output is inherently certain. This is not a bug in vLLM; it’s a reflection of the model’s output distribution for your specific prompt and context, as discussed in issue #2613.
Would you like more detail or help diagnosing your prompt/context?
Sources:
- https://github.com/vllm-project/vllm/issues/9453
- https://docs.vllm.ai/en/latest/usage/v1_guide.html#embedding-models
Hint: Mention @RunLLM in the post for followups.