[Questions]Is there a plan to support the rerank model and embedding model

Hello, may I ask if there are any plans for vllm-ascend to support rerank(bge-reranker-v2-m3) and embedding(bge-m3) models

Yes, vLLM already supports both rerank (e.g., BAAI/bge-reranker-v2-m3) and embedding (e.g., BAAI/bge-m3) models on its mainline, including via OpenAI-compatible endpoints. You can serve these models using commands like vllm serve BAAI/bge-reranker-v2-m3 or vllm serve BAAI/bge-m3, and use the /score, /rerank, or /embeddings APIs as documented. However, there is no explicit mention in the provided data sources of “vllm-ascend” (Ascend hardware/NPUs) support for these models—current documentation and issues focus on CPU and GPU backends only. If you are referring to Huawei Ascend NPUs, this is not mentioned as supported in the official docs or issues.

Would you like more detail on how to use these models with vLLM on supported hardware, or are you specifically interested in Ascend/NPU support?

Sources:

Hint: Mention @RunLLM in the post for followups.

See related ^ , V1 support for embedding model will ready soon. (Maybe v0.11.1rc1)