Which version of vLLM supports the best_of parameter in the OpenAI-compatible API?

Alternatively, I want to obtain n results sorted by seq_prob from high to low.

Right now I set sample_n, but since vLLM performs independent sampling for each completion, the n results often contain duplicates. I need a way to get the top-n most probable sequences instead of repeated samples.

vLLM’s n parameter in SamplingParams generates n independent samples, which can result in duplicate outputs, as each sample is drawn independently. There is currently no built-in way to directly obtain the top-n most probable (highest sequence probability) unique completions in a single call—vLLM does not support returning the n most likely sequences ranked by probability without duplicates. The best_of parameter can be used to generate more candidates and return the top-n, but it is only supported in vLLM v0 and may not fully prevent duplicates or guarantee unique outputs, and its behavior is not always intuitive or well-documented for this use case. See the discussion in vLLM issue #639 and SamplingParams docs.

If you need strictly the n most probable unique sequences, you would need to implement custom logic outside vLLM: generate a large number of samples (using n or best_of), compute their probabilities (using logprobs), deduplicate, and then select the top-n by probability. This is not natively supported by vLLM as of now. Would you like more detail or a code example for this workaround?

Sources:

Hint: Mention @RunLLM in the post for followups.