I want to switch the pooling method to ‘first token hidden states’ or ‘[CLS] hidden states‘. How do I realize it.
You can switch the pooling method in vLLM by setting the pooling_type
in the pooler configuration. For “first token hidden states,” use pooling_type="CLS"
(which refers to the [CLS] token in models that use it). You can do this by passing the --override-pooler-config '{"pooling_type": "CLS"}'
argument when running vLLM, or by setting the pooling_type
in your code via the PoolerConfig
or PoolingParams
object. For models that use the first token as [CLS], this will extract the [CLS] hidden state as the embedding or output. For other models, ensure that the first token is indeed the [CLS] token for your use case. See the pooling models documentation and PoolerConfig API for details.
Example command-line usage:
--override-pooler-config '{"pooling_type": "CLS"}'
Would you like more detail or a code example?
Sources:
If I want to perform a classify task, what structure should my model have?
Your model should be a sequence classification model, typically with a single linear classification head on top of the base model (e.g., *ForSequenceClassification
architectures). vLLM can also automatically convert compatible generative models (like *ForCausalLM
) into classification models using the as_seq_cls_model
adapter, as long as the classification head is a single linear layer named score
in the model. See supported models and model conversion details.
Would you like more detail or a code example?
Sources:
How can i change the activate function from softmax to sigmoid in classification task.
Set the activation function to sigmoid by disabling softmax in the pooler config: use --override-pooler-config '{"activation": false}'
or set activation=False
in PoolerConfig
/PoolingParams
. This will return logits, which you can then apply sigmoid to manually. See PoolingParams API and relevant issue.
Would you like more detail or a code example?
Sources:
Now I have a bge-m3 model with a linear for classification. Could I use vLLM to deploy the model to do classifycation?
Yes, you can use vLLM to deploy a BGE-M3 model with a linear classification head for classification, provided your model follows the *ForSequenceClassification
architecture or can be adapted using as_seq_cls_model
. vLLM supports automatic conversion for compatible models and will extract class probabilities from the linear head. See supported models and pooling models documentation.
Would you like more detail or a code example?
Sources:
And I also need to change the pooling method to “CLS” and the activation to “None“. Then please give me a code example.
You can set the pooling method to “CLS” and disable the activation (so you get logits, not probabilities) by overriding the pooler config when initializing your LLM. Here’s a code example:
from vllm import LLM
llm = LLM(
model="your-bge-m3-model-path-or-hf-name",
task="classify",
override_pooler_config={"pooling_type": "CLS", "activation": False}
)
prompts = ["Example input text"]
outputs = llm.classify(prompts)
logits = outputs[0].outputs.data # Access logits directly
print("Logits:", logits)
This configures the model to use CLS pooling and disables the activation function, returning logits. See pooling models documentation.
Would you like more explanation?
Sources:
I also need the model without normalization. And I want to know what type of model checkpoint I should provide to vllm?
Set normalization off by adding "normalize": False
to your pooler config:
llm = LLM(
model="your-bge-m3-model-path-or-hf-name",
task="classify",
override_pooler_config={"pooling_type": "CLS", "activation": False, "normalize": False}
)
For the checkpoint, provide a Hugging Face Transformers-compatible directory with config.json
, model weights (e.g., pytorch_model.bin
or model.safetensors
), and tokenizer files. The model should be saved using save_pretrained
and follow the *ForSequenceClassification
architecture or be convertible as described in supported models.
Would you like more detail on checkpoint structure or saving?
Sources:
Which parameter determines the categories of a classification model?
The number and names of categories (class labels) in a classification model are determined by the num_labels
and id2label
(or label2id
) fields in the model’s Hugging Face config (config.json
). See supported models and HF config usage.
Would you like more detail or a code example?
Sources:
Does vllm support the inference of custom classification models?
Yes, vLLM supports inference for custom classification models as long as the model is compatible with Hugging Face Transformers (i.e., has a proper config.json
, weights, and follows the *ForSequenceClassification
architecture or can be adapted). You can provide a local directory or Hugging Face model path. See supported models and custom model instructions.
Would you like more detail on how to prepare or load your custom model?
Sources:
Thanks please, I need more detail.
Certainly! To use a custom classification model with vLLM, your model must be saved in the Hugging Face Transformers format. This means your model directory should contain at least:
config.json
(witharchitectures
set to something likeYourModelForSequenceClassification
, and fields likenum_labels
,id2label
, andlabel2id
if you want custom class names)- Model weights (
pytorch_model.bin
ormodel.safetensors
) - Tokenizer files (e.g.,
tokenizer.json
,vocab.txt
,tokenizer_config.json
)
How to prepare your model:
- If you trained your model using Hugging Face Transformers, use
model.save_pretrained("your_model_dir")
andtokenizer.save_pretrained("your_model_dir")
. - Make sure your model class is compatible with
AutoModelForSequenceClassification
or can be adapted (vLLM can auto-convert some generative models). - If your model is not natively supported, you can still use it by setting
trust_remote_code=True
and providing the path to your model directory.
How to load and run inference in vLLM:
from vllm import LLM
llm = LLM(
model="path/to/your_model_dir",
task="classify", # or use runner/convert as per latest docs
trust_remote_code=True # if using a custom model class
)
prompts = ["Your input text here"]
outputs = llm.classify(prompts)
print(outputs[0].outputs.probs) # Probabilities for each class
You can also override pooling/activation as needed (see previous answers).
For more, see custom model instructions and classification docs.
Sources: