Hello,
I am using a Mistral 7B Instruct v3 model to classify text documents. The output of the model takes the form of multi-token class labels. Currently, I am deploying the model using AWS LMI containers powered by DJL and using the djl_python.lmi_vllm.vllm_async_service entrypoint, which runs blazing fast and returns top-K log probabilities (if requested).
I am interested in extracting the log probabilities for the class labels, as the top K (where K is set to the number of classes) may be associated with tokens different that the ones associated with class labels.
What would be the best approach to do this?
I am thinking about the following solutions, but require some expertize as evidenced by the associated questions:
- Set task = classification. This should produce an array of scores for each class.
Will this approach invoke a different entry point than djl_python.lmi_vllm.vllm_async_service? Are class probabilities accounting for the multi-token structure of the label? (i.e. is there some sort of aggregation performed across the tokens of the class?)
- Create a mapping that links multi-token class labels to single-token labels. Retrain the model with these single-token classes and request top K probabilities from the inference server while setting vLLM
allowed_token_ids
to the set of single token class labels. Then write acustom_output_formatter
function that maps single label tokens to their original multi-token equivalent.
In this case, the probability for the predicted single-token label would not require aggregation across other tokens (as in the multi-token label case). As allowed_token_ids forces the model output to use only the specified list of tokens, when requesting top K (with K being the number of classes), we should get the desired output. Is there a better approach or something missing?
Thank you for looking into this!