Mixedbread reranker on vLLM /score
: scores differ vs local Mixedbread; small payload = same order/different scores, large payload = different order
I’m serving Mixedbread v2 as Qwen2 seq-cls via hf_overrides
and calling /score
:
vllm serve mixedbread-ai/mxbai-rerank-base-v2 \
--hf_overrides '{"architectures":["Qwen2ForSequenceClassification"],"classifier_from_token":["0","1"],"method":"from_2_way_softmax"}' \
--host 0.0.0.0 --port 8000
Env: AWS g6e.xlarge (L40S 48 GB).
Reference: local from mxbai_rerank import MxbaiRerankV2
on Mac M3 Pro (CPU).
Key observations:
- Scores differ between vLLM and local Mixedbread usage.
- Small payload (“To Kill a Mockingbird”): same ranking, but vLLM assigns much higher probs to negatives than Mixedbread.
- Larger payload (10 docs ~1K chars; query “Who wrote the novel 1984?”): ranking order diverges vs Mixedbread.
Small payload — side-by-side (vLLM vs MxbaiRerankV2
)
Doc (short) | vLLM /score |
MxbaiRerankV2 |
---|---|---|
Harper Lee (novel) | 0.9974413 | 0.9941708 |
Harper Lee (bio) | 0.9837090 | 0.9705565 |
Jane Austen | 0.9364685 | 0.2971900 |
Harry Potter | 0.7970418 | 0.0699584 |
Moby-Dick | 0.6106243 | 0.0519386 |
Gatsby | 0.5890393 | 0.0290230 |
Large payload (~1K chars/doc, “1984”) — order diverges
- Query:
Who wrote the novel 1984?
- 10 docs (~1K chars each). Full JSON included in the Issue ([Usage]: vLLM `/score` with Mixedbread reranker (Qwen2 seq-cls override): **scores differ vs local Mixedbread**; small payload = same order/different scores; large payload (\~1K chars/doc) = **order diverges** · Issue #22983 · vllm-project/vllm · GitHub).
Ask: what’s the recommended way to get score parity and stable order (label tokens, pair template, calibration) so vLLM /score
matches the Mixedbread library?