Max_tokens_per_doc support for rerank models

@RunLLM it seems that in vllm, rerank models do not support max_tokens_per_doc, but instead truncate_prompt_tokens, so vLLM can not truncate each doc, but truncate the whole context, right?

Correct. In vLLM, there is no max_tokens_per_doc parameter for rerank models; instead, you can use truncate_prompt_tokens, which truncates the entire input prompt (i.e., the whole context), not each document individually. This means vLLM does not natively support truncating each document within a batch for rerank models—truncation applies to the total input sequence only, not per-document granularity. This limitation is discussed in detail in related issues and code references, confirming that per-document truncation is not currently implemented in vLLM for rerank tasks, and any such logic would need to be handled client-side or by preprocessing inputs before sending to vLLM[^1^][^2^][^3^].

Would you like more detail or example code for handling per-document truncation outside vLLM?

Sources:

Hint: Mention @RunLLM in the post for followups.