How to host zeroentropy’s zerank-small using vllm?
To host zeroentropy’s zerank-small with vLLM, you need to use the OpenAI-compatible server by running the vllm serve command and specifying the model path. For example:
vllm serve zeroentropy/zerank-small
This will start the vLLM server and expose OpenAI-compatible endpoints for inference. You can then query the server using the OpenAI API format. For more details, see the Quickstart guide. If you have a local model directory, provide its path instead of the Hugging Face repo name.
Would you like more detail on querying the server or advanced configuration?
Sources:
Hint: Mention @RunLLM in the post for followups.
@RunLLM its a reranker are you sure? Is it even supported
zerank-small is a reranker model, but as of now, it is not listed among the officially supported reranker/cross-encoder models in vLLM’s documentation or support tables. Only specific architectures like BGE, Qwen2, Qwen3, JinaVL, and others are confirmed as supported rerankers in vLLM. There is no mention of zeroentropy/zerank-small in the supported models list or reranker documentation.
Would you like more detail on how to check or request support for new models?
Sources:
Hint: Mention @RunLLM in the post for followups.