How to get probs directly

Yaawennn · July 25, 2025, 11:03am

In VLLM V1, how can I modify it to directly obtain probs and skip logprobs calculation?

RunLLM · July 25, 2025, 11:03am

In vLLM V1, logprobs are returned immediately from the model’s raw output, before any post-processing, and there is currently no built-in option to skip logprobs calculation and return only probabilities (“probs”) directly. The logprobs calculation is part of the core output pipeline, and support for post-sampling adjustments or direct probability output is not yet available, but is planned for future updates. You would need to modify the internal code (e.g., in the sampler) to extract and return the softmax probabilities (probs) instead of or in addition to logprobs, as there is no user-facing API or parameter for this in V1 as of now. See V1 User Guide: Logprobs and logprobs API docs.

Would you like more detail on which part of the code to modify for this purpose?

Sources:

Yaawennn · July 25, 2025, 11:07am

Yes, I want to know where the code in v1 implements the logprobs calculation? And where can I modify it to calculate probs instead of logprobs. Please tell me!

RunLLM · July 25, 2025, 11:07am

In vLLM V1, logprobs calculation is implemented in the sampler, specifically in the compute_logprobs method of the Sampler class (see API docs). The logprobs are computed from the model’s raw logits using torch.log_softmax, and this happens before any post-processing like temperature or top-k/top-p is applied.

To modify vLLM V1 to return probabilities (probs) instead of logprobs, you should locate the compute_logprobs method in vllm/v1/sample/sampler.py and change the calculation from torch.log_softmax(logits, dim=-1) to torch.softmax(logits, dim=-1). You will also need to update the output structure to return these probabilities instead of logprobs.

Would you like a step-by-step example of how to make this code change?

Sources:

Yaawennn · July 27, 2025, 11:17am

I saw this code in the v1 sampler, and I found that the Sampler class can specify the tokens to process by passing the sampling_metadata: SamplingMetadata parameter; How is this part used? During generation, I only want to get probs for some tokens

RunLLM · July 27, 2025, 11:17am

You are correct: in vLLM V1, the Sampler class receives a sampling_metadata: SamplingMetadata parameter, which contains information about which tokens to process, including allowed token masks and other per-token controls (Sampler API, SamplingMetadata API). To get probabilities (probs) for only specific tokens during generation, you can use the allowed_token_ids_mask field in SamplingMetadata to mask out unwanted tokens, so the softmax is only computed over the allowed tokens.

However, there is no built-in user-facing API to return probs for arbitrary token subsets; you would need to customize the sampler code to extract and return the probabilities for your desired tokens after applying the mask. This involves modifying the Sampler.forward method to gather and output probs for the indices specified by your mask in sampling_metadata.allowed_token_ids_mask.

Would you like a code example for how to do this modification?

Sources:

Yaawennn · July 28, 2025, 6:26am

I want to specify in the Sampler class how the logprobs are calculated based on my custom parameter sampling_metadata.extra_args. But in v1, we can’t pass extra_args as a parameter, as we can in v0. How should I implement it?

RunLLM · July 28, 2025, 6:26am

In vLLM V1, you can pass custom arguments for sampling via the extra_args field in SamplingParams, which are then accessible in sampling_metadata.extra_args within the Sampler class. This is supported in V1 as of recent updates, and these custom arguments can be set through the REST API using the vllm_xargs field in your request (RFC: Custom sampling params, PR: Expose custom args in OpenAI APIs).

To implement custom logprobs calculation, add your custom parameter to vllm_xargs in your API request. Then, in your custom Sampler logic, access it via sampling_metadata.extra_args and branch your logprobs calculation accordingly.

Would you like a code snippet showing how to access and use extra_args in the Sampler?

Sources:

Yaawennn · July 28, 2025, 6:30am

Please tell me how the parameters in SamplingParams are passed to Metadata in v1, and where is the corresponding code

RunLLM · July 28, 2025, 6:31am

In vLLM V1, parameters in SamplingParams are passed to SamplingMetadata during batch preparation. This mapping occurs in the input batch logic, specifically in the InputBatch class (see InputBatch code reference), where each request’s SamplingParams is used to populate fields in SamplingMetadata via the _make_sampling_metadata() method. This method collects per-request sampling parameters and constructs the SamplingMetadata object, which is then passed to the sampler.

For example, in gpu_input_batch.py, the _make_sampling_metadata() method gathers fields like temperature, top_k, top_p, penalties, and any extra_args from SamplingParams and sets them in the resulting SamplingMetadata instance (SamplingMetadata API).

Would you like a code snippet showing this mapping process?

Sources:

SamplingMetadata API

Yaawennn · July 28, 2025, 6:39am

v1 can also specify allowed sampling tokens right? How to do it, give an example

RunLLM · July 28, 2025, 6:39am

Yes, vLLM V1 supports specifying allowed sampling tokens using the allowed_token_ids parameter in SamplingParams. This restricts generation to only the provided token IDs (SamplingParams API, Sampler API).

Example usage in Python:

from vllm import LLM, SamplingParams

llm = LLM(model="facebook/opt-125m")
params = SamplingParams(allowed_token_ids=[42, 43, 44], max_tokens=10)
outputs = llm.generate("Hello", params)
print(outputs[0].outputs[0].text)

Would you like more detail on how this works internally?

Sources:

Topic		Replies	Views
How to obtain the logprob of a specified token in a step? General	7	516	July 14, 2025
How to obtain the logits of LLM General	31	1072	July 21, 2025
Numerical Difference between vLLM logprobs and huggingface logprobs RL Integration	7	3909	April 4, 2025
Explain the relationship between SamplingParams and SamplingMetadata General	14	25	July 21, 2025
Logprobs output from vllm bench serve Benchmarking	6	34	September 27, 2025

How to get probs directly

Related topics