Extract class log probabilities from LLM classifier using async vLLM and AWS LMI containers

Adrian · April 28, 2025, 10:47am

Hello,

I am using a Mistral 7B Instruct v3 model to classify text documents. The output of the model takes the form of multi-token class labels. Currently, I am deploying the model using AWS LMI containers powered by DJL and using the djl_python.lmi_vllm.vllm_async_service entrypoint, which runs blazing fast and returns top-K log probabilities (if requested).

I am interested in extracting the log probabilities for the class labels, as the top K (where K is set to the number of classes) may be associated with tokens different that the ones associated with class labels.

What would be the best approach to do this?

I am thinking about the following solutions, but require some expertize as evidenced by the associated questions:

Set task = classification. This should produce an array of scores for each class.

Will this approach invoke a different entry point than djl_python.lmi_vllm.vllm_async_service? Are class probabilities accounting for the multi-token structure of the label? (i.e. is there some sort of aggregation performed across the tokens of the class?)

Create a mapping that links multi-token class labels to single-token labels. Retrain the model with these single-token classes and request top K probabilities from the inference server while setting vLLM allowed_token_ids to the set of single token class labels. Then write a custom_output_formatter function that maps single label tokens to their original multi-token equivalent.

In this case, the probability for the predicted single-token label would not require aggregation across other tokens (as in the multi-token label case). As allowed_token_ids forces the model output to use only the specified list of tokens, when requesting top K (with K being the number of classes), we should get the desired output. Is there a better approach or something missing?

Thank you for looking into this!

RunLLM · April 28, 2025, 10:48am

To extract log probabilities for multi-token class labels using vLLM, you can consider the following approaches:

Set Task to Classification: According to vLLM documentation, setting the task to “classify” should produce an array of scores for each class. This approach should work with the djl_python.lmi_vllm.vllm_async_service entry point, as it is designed to handle classification tasks. However, the class probabilities are typically calculated for each token, and aggregation across tokens for multi-token labels might not be automatically handled. You may need to implement custom logic to aggregate probabilities for multi-token labels.
Mapping Multi-Token to Single-Token Labels: This approach involves creating a mapping from multi-token class labels to single-token labels and retraining the model. You can then use the allowed_token_ids parameter to restrict the output to these single-token labels. This method simplifies the probability extraction process, as you only need to consider single-token labels. However, it requires retraining the model, which might not be feasible depending on your resources and constraints.

For both approaches, ensure that the model’s output is compatible with your classification needs, and consider implementing custom logic to handle multi-token label aggregation if necessary. If you need more detailed guidance on implementing these solutions, please let me know.

Sources:

Topic		Replies	Views
How to obtain the logits of LLM General	21	45	June 19, 2025
Numerical Difference between vLLM logprobs and huggingface logprobs RL Integration	7	381	April 4, 2025
vLLM output vs Ollama General	8	238	April 10, 2025
Does the vLLM v1 support Speculative Decoding now? V1 Feedback	4	152	March 25, 2025
为什么在v1架构中tokenier使用线程池而不使用进程池 General	3	58	May 26, 2025

Extract class log probabilities from LLM classifier using async vLLM and AWS LMI containers

Related topics