Extract class log probabilities from LLM classifier using async vLLM and AWS LMI containers

Hello,

I am using a Mistral 7B Instruct v3 model to classify text documents. The output of the model takes the form of multi-token class labels. Currently, I am deploying the model using AWS LMI containers powered by DJL and using the djl_python.lmi_vllm.vllm_async_service entrypoint, which runs blazing fast and returns top-K log probabilities (if requested).

I am interested in extracting the log probabilities for the class labels, as the top K (where K is set to the number of classes) may be associated with tokens different that the ones associated with class labels.

What would be the best approach to do this?

I am thinking about the following solutions, but require some expertize as evidenced by the associated questions:

  1. Set task = classification. This should produce an array of scores for each class.

Will this approach invoke a different entry point than djl_python.lmi_vllm.vllm_async_service? Are class probabilities accounting for the multi-token structure of the label? (i.e. is there some sort of aggregation performed across the tokens of the class?)

  1. Create a mapping that links multi-token class labels to single-token labels. Retrain the model with these single-token classes and request top K probabilities from the inference server while setting vLLM allowed_token_ids to the set of single token class labels. Then write a custom_output_formatter function that maps single label tokens to their original multi-token equivalent.

In this case, the probability for the predicted single-token label would not require aggregation across other tokens (as in the multi-token label case). As allowed_token_ids forces the model output to use only the specified list of tokens, when requesting top K (with K being the number of classes), we should get the desired output. Is there a better approach or something missing?

Thank you for looking into this!

To extract log probabilities for multi-token class labels using vLLM, you can consider the following approaches:

  1. Set Task to Classification: According to vLLM documentation, setting the task to “classify” should produce an array of scores for each class. This approach should work with the djl_python.lmi_vllm.vllm_async_service entry point, as it is designed to handle classification tasks. However, the class probabilities are typically calculated for each token, and aggregation across tokens for multi-token labels might not be automatically handled. You may need to implement custom logic to aggregate probabilities for multi-token labels.

  2. Mapping Multi-Token to Single-Token Labels: This approach involves creating a mapping from multi-token class labels to single-token labels and retraining the model. You can then use the allowed_token_ids parameter to restrict the output to these single-token labels. This method simplifies the probability extraction process, as you only need to consider single-token labels. However, it requires retraining the model, which might not be feasible depending on your resources and constraints.

For both approaches, ensure that the model’s output is compatible with your classification needs, and consider implementing custom logic to handle multi-token label aggregation if necessary. If you need more detailed guidance on implementing these solutions, please let me know.

Sources: