How to obtain the logprob of a specified token in a step?

machine · July 14, 2025, 1:20pm

I want to know how to obtain the logprob of a specified token in a generation step in vllm.

RunLLM · July 14, 2025, 1:21pm

To obtain the log probability (logprob) of a specified token during generation in vLLM, set the logprobs parameter in SamplingParams to the desired number of top logprobs to return. The output will include a list of dictionaries mapping token IDs to their logprobs for each generated token. You can then look up the logprob for your specified token in these dictionaries. For prompt tokens, use the prompt_logprobs parameter similarly. See test_logprobs.py and SamplingParams usage.

Example:

from vllm import LLM, SamplingParams

llm = LLM(model="facebook/opt-125m")
params = SamplingParams(logprobs=5)  # top 5 logprobs per token
outputs = llm.generate(["Your prompt here"], params)
logprobs_dict = outputs[0].outputs[0].logprobs  # list of dicts, one per token
# To get logprob of token_id X at step i:
logprob = logprobs_dict[i].get(X)

Would you like more detail?

Sources:

machine · July 14, 2025, 1:25pm

How can i obtain the logprob of the token if the specific token is not in top logprobs?

RunLLM · July 14, 2025, 1:26pm

If the specific token is not in the top-k logprobs returned by vLLM, its logprob will not be present in the output. vLLM does not natively return logprobs for tokens outside the top-k; you must set logprobs to a large enough value to include your token, or, if the token is still not present, assign it a default low value (e.g., -100 or -9999.0) as a workaround. This is a known limitation, as discussed in issue #2613 and issue #3779.

Would you like more detail or code examples?

Sources:

machine · July 14, 2025, 1:29pm

What is the maximum number of logprobs that a vllm can output.

RunLLM · July 14, 2025, 1:29pm

The maximum number of logprobs vLLM can output is controlled by the max_logprobs parameter when initializing the LLM class. By default, this is set to 20, but you can increase it as needed (e.g., up to the vocabulary size) by specifying max_logprobs in the LLM constructor. The value in SamplingParams.logprobs cannot exceed this limit, or a ValueError will be raised. See ModelConfig.max_logprobs docs and issue #5299.

Would you like more detail?

Sources:

machine · July 14, 2025, 1:33pm

How much does the number of output logprobs affect the generation efficiency of vllm?

RunLLM · July 14, 2025, 1:33pm

Increasing the number of output logprobs significantly degrades vLLM generation efficiency, with runtime scaling nearly O(n) as logprobs increases. This is because retrieving more logprobs is computationally expensive, not just a simple slice from precomputed logits. For large values (e.g., logprobs near vocab size), generation can be much slower—see issue #14300 for benchmarks and discussion.

Would you like more detail or performance data?

Sources:

issue #14300

NilsHellwig · October 24, 2025, 2:06pm

        "logprobs": \[

            {

                "236840": 0.9998205030304157,

                "17576": 0.00017952801482389944,

                "329": 2.8491101918989194e-11,

                "16": 0.0,

                "0": 0.0,

                "15": 0.0,

                "13": 0.0,

                "14": 0.0,

                "10": 0.0,

                "9": 0.0,

                "11": 0.0,

                "6": 0.0,

                "12": 0.0,

                "3": 0.0,

                "1": 0.0,

                "2": 0.0,

                "8": 0.0,

                "7": 0.0,

                "5": 0.0,

                "4": 0.0

            },

why is the top token about 1.0 every time? i set temperature to 1.0.

NilsHellwig · October 24, 2025, 2:07pm

        "logprobs": \[

            {

                "236840": 0.9998205030304157,

                "17576": 0.00017952801482389944,

                "329": 2.8491101918989194e-11,

                "16": 0.0,

                "0": 0.0,

                "15": 0.0,

                "13": 0.0,

                "14": 0.0,

                "10": 0.0,

                "9": 0.0,

                "11": 0.0,

                "6": 0.0,

                "12": 0.0,

                "3": 0.0,

                "1": 0.0,

                "2": 0.0,

                "8": 0.0,

                "7": 0.0,

                "5": 0.0,

                "4": 0.0

            },

why is the top token about 1.0 every time? i set temperature to 1.0.

@RunLLM

RunLLM · October 24, 2025, 2:08pm

If the top token’s probability is always near 1.0, it means the model is extremely confident in its prediction, assigning almost all probability mass to a single token. This can happen if the prompt context is highly deterministic or if the model’s logits are very peaked. However, with temperature=1.0, you would typically expect more distributed probabilities unless the model output is inherently certain. This is not a bug in vLLM; it’s a reflection of the model’s output distribution for your specific prompt and context, as discussed in issue #2613.

Would you like more detail or help diagnosing your prompt/context?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
How to get probs directly General	11	664	July 28, 2025
Numerical Difference between vLLM logprobs and huggingface logprobs RL Integration	7	4327	April 4, 2025
What is the purpose of prompt logprobs? General	5	663	October 27, 2025
How to obtain the logits of LLM General	31	2082	July 21, 2025
Extract class log probabilities from LLM classifier using async vLLM and AWS LMI containers General	1	131	April 28, 2025

How to obtain the logprob of a specified token in a step?

Related topics