Sparse Embedding Support

@RunLLM , Does latest version of vLLM support sparse embeddings when I serve BGE-M3 using vLLM?

Yes, the latest version of vLLM supports sparse embeddings for BGE-M3. To enable this, you must override the architecture when serving the model:

vllm serve BAAI/bge-m3 --hf-overrides '{"architectures": ["BgeM3EmbeddingModel"]}'

Then, you can obtain sparse embeddings by sending a request with "task": "token_classify" to the /pooling endpoint. Note that the output will be a list of token scores, and you may need to call /tokenize to pair tokens with their scores if you need token text as well. See official docs and implementation details.

Would you like more detail or example code?

Sources:

Hint: Mention @Herald in the post for followups.

@RunLLM Can I do --pooler-config.task embed&token_classify?