How to obtain the logits of LLM

Yes, I always need the code example! Thanks!

Certainly! Here’s how you can introduce and use an is_sigmoid parameter in vLLM V1 using extra_args, following the current extensibility pattern (Custom sampling params RFC):

1. When creating SamplingParams, add is_sigmoid to extra_args:

from vllm import SamplingParams

sampling_params = SamplingParams(
    temperature=0.8,
    top_p=0.95,
    extra_args={"is_sigmoid": True}
)

2. In your custom sampler logic (e.g., in vllm/v1/sample/sampler.py), access is_sigmoid from extra_args:

import torch

class Sampler(nn.Module):
    ...
    def forward(
        self,
        logits: torch.Tensor,
        sampling_metadata,
        sampling_params,  # Make sure to pass this in your call chain
        # ... other args
    ):
        is_sigmoid = sampling_params.extra_args.get("is_sigmoid", False)
        if is_sigmoid:
            probs = torch.sigmoid(logits)
            # Use or return probs as needed
            return {"sigmoid_probs": probs}
        else:
            logprobs = torch.log_softmax(logits, dim=-1)
            return {"logprobs": logprobs}

Note:

  • You must ensure that the relevant SamplingParams instance is available in your sampler logic. If not, you may need to modify the call chain to pass it.
  • This approach is forward-compatible with vLLM’s planned plugin and custom argument support.

Would you like more detail on how to wire SamplingParams into the V1 sampler if it’s not already available?

Sources:

If I want to obtain the complete logits, how should I modify the code of vllm?

To obtain the complete logits in vLLM, you need to modify the code to expose logits before any post-processing (like softmax or logprobs). For vLLM V1, you should edit the Sampler’s forward method in vllm/v1/sample/sampler.py to return or save the logits tensor directly, as the default API does not expose them (relevant issue). For vLLM V0, make a similar change in vllm/model_executor/layers/sampler.py.

For example, in the forward method, after logits are computed, add:

return {"logits": logits}

or include logits in your custom output structure. This will allow you to access the full logits for each step.

Would you like a more detailed, step-by-step code example for a specific version?

Sources:

Yes, please give me a step-by-step code.

Certainly! Here’s a step-by-step guide to modify vLLM V1 so you can obtain the complete logits at each generation step. This approach is based on community discussions and the vllm/v1/sample/sampler.py code.


1. Modify the SamplerOutput to include logits

In vllm/v1/outputs.py, add a new field to the SamplerOutput class:

# vllm/v1/outputs.py

class SamplerOutput:
    def __init__(self, sampled_token_ids, logprobs_tensors=None, logits=None):
        self.sampled_token_ids = sampled_token_ids
        self.logprobs_tensors = logprobs_tensors
        self.logits = logits  # <-- Add this line

2. Update the Sampler to return logits

In vllm/v1/sample/sampler.py, update the forward method to include logits in the output:

# vllm/v1/sample/sampler.py

class Sampler(nn.Module):
    ...
    def forward(
        self,
        logits: torch.Tensor,
        sampling_metadata: SamplingMetadata,
    ) -> SamplerOutput:
        # ... existing code ...

        # At the end, include logits in the output
        sampler_output = SamplerOutput(
            sampled_token_ids=sampled.unsqueeze(-1),
            logprobs_tensors=logprobs_tensors,
            logits=logits  # <-- Add this line
        )
        return sampler_output

3. Access logits in your inference code

When you call the sampler, you can now access the logits from the returned SamplerOutput object:

output = sampler(logits, sampling_metadata)
logits_tensor = output.logits  # This is the full logits tensor for each step

Note:

  • If you use a custom engine or API, you may need to propagate the logits field through your pipeline.
  • For vLLM V0, the process is similar: add a logits field to the output structure in vllm/model_executor/layers/sampler.py and return it.

Would you like a full example for vLLM V0 or more details on integrating this with the rest of your pipeline?

Sources:

In fact, I want a code example in vLLM V0

It seems likely that, for vLLM V0, you should modify the Sampler class in vllm/model_executor/layers/sampler.py to include logits in the output. Here’s a step-by-step example:


1. Add logits to SamplerOutput

Find the SamplerOutput class in sampler.py and add a logits field:

# vllm/model_executor/layers/sampler.py

class SamplerOutput:
    def __init__(self, ..., logits=None, ...):
        ...
        self.logits = logits  # Add this line

2. Return logits in the Sampler.forward method

In the Sampler.forward method, after logits are computed and before any post-processing, include them in the output:

# vllm/model_executor/layers/sampler.py

class Sampler(nn.Module):
    ...
    def forward(
        self,
        logits: torch.Tensor,
        sampling_metadata,
    ):
        # ... existing code for penalties, temperature, etc. ...

        # Compute probabilities and logprobs as usual
        probs = torch.softmax(logits, dim=-1, dtype=torch.float)
        logprobs = torch.log_softmax(logits, dim=-1, dtype=torch.float)

        # ... existing code for sampling ...

        # When building the output, include logits
        sampler_output = SamplerOutput(
            ...,
            logits=logits,  # Add this line
            ...
        )
        return sampler_output

3. Access logits in your pipeline

Wherever you receive the SamplerOutput, you can now access the full logits:

output = sampler.forward(logits, sampling_metadata)
logits_tensor = output.logits  # This is the full logits tensor

This approach is based on the structure and extensibility of vLLM V0’s sampler (see code reference). You may need to propagate the logits field through any additional output structures if you want to access it at a higher level.

Would you like more detail on how to propagate logits through the rest of the vLLM pipeline?

Sources:

Can we skip the sample stage and only output logits without performing any other operations?

No, vLLM does not natively support skipping the sampling stage and outputting only logits via its public API. However, you can achieve this by modifying the code: in vLLM V0, edit the Sampler’s forward method in vllm/model_executor/layers/sampler.py to return the logits immediately after they are computed, before any sampling, softmax, or logprobs operations. This approach is discussed and confirmed by users and maintainers (see issue 11397).

Would you like a code snippet showing exactly where to return logits in the Sampler?

Sources:

Yes, I need a code example!

Certainly! Here’s a minimal code example for vLLM V0 to skip sampling and output only logits:

In vllm/model_executor/layers/sampler.py:

import torch
import torch.nn as nn

class Sampler(nn.Module):
    def forward(
        self,
        logits: torch.Tensor,
        sampling_metadata,
    ):
        # Directly return logits before any further processing
        return logits

Usage:
When you call the sampler, you will now get the raw logits tensor as output, with no sampling, softmax, or logprobs applied.

If you need to keep the SamplerOutput structure, you can do:

class SamplerOutput:
    def __init__(self, logits):
        self.logits = logits

class Sampler(nn.Module):
    def forward(
        self,
        logits: torch.Tensor,
        sampling_metadata,
    ):
        return SamplerOutput(logits=logits)

This will bypass all sampling and post-processing, giving you only the logits.

Sources: