Why doesn't the parameter n in samplingparams work as expected

tense_chen · July 21, 2025, 4:09am

Here is the code, it’s very short. Using vllm v0.9.2

import asyncio
from typing import Optional

from vllm.engine.arg_utils import AsyncEngineArgs
from vllm.engine.async_llm_engine import AsyncLLMEngine
from vllm.outputs import RequestOutput
from vllm.sampling_params import SamplingParams
from vllm.usage.usage_lib import UsageContext
async def main():
    engine_args = AsyncEngineArgs(
        model="Qwen/Qwen2.5-0.5B",
        dtype="auto",
        max_model_len=2048,
        enforce_eager=True,
    )
    from vllm.v1.engine.async_llm import AsyncLLM
    # engine_client = AsyncLLM.from_engine_args(engine_args, usage_context=UsageContext.OPENAI_API_SERVER)
    usage_context = UsageContext.OPENAI_API_SERVER
    vllm_config = engine_args.create_engine_config(usage_context=usage_context)

    engine_client = AsyncLLM.from_vllm_config(
        vllm_config=vllm_config,
        usage_context=usage_context,
        disable_log_requests=engine_args.disable_log_requests,
        disable_log_stats=engine_args.disable_log_stats)

    sampling_params = SamplingParams(
        max_tokens=256,
        n=2,
    )

    prompt = "output a sentence with 10 words"
    final_output: Optional[RequestOutput] = None
    async for output in engine_client.generate(
        prompt=prompt,
        sampling_params=sampling_params,
        request_id="abcdef",
    ):
        if final_output:
            print(len(final_output.outputs))
        else:
            print("none")
        final_output = output
    if final_output:
        print(final_output.outputs)


if __name__ == "__main__":
    asyncio.run(main())

only got one output in the end.
output:

2
2
...
2
1
... 
1
[CompletionOutput(index=0, text='. deadly language is deleting to record unimaginy dishing untouchable utilityfulness it cuchous inc ritims on the indian boarding using everything that is a queer across unnerving possessions. darming is waiting for all the wrong ends are with the length skull, kidding once again in greed neo nation.', token_ids=[13, 24142, 4128, 374, 33011, 311, 3255, 86112, 88, 294, 10976, 13065, 3026, 480, 15549, 30414, 432, 272, 1387, 782, 3625, 21198, 5742, 389, 279, 42225, 48969, 1667, 4297, 429, 374, 264, 54541, 3941, 16950, 19505, 52535, 13, 294, 32902, 374, 8580, 369, 678, 279, 4969, 10335, 525, 448, 279, 3084, 34013, 11, 55725, 3055, 1549, 304, 55826, 35082, 6995, 13, 151643], cumulative_logprob=None, logprobs=None, finish_reason=stop, stop_reason=None)]

RunLLM · July 21, 2025, 4:10am

Your code uses SamplingParams(n=2) to request two completions, but only one output is returned in the final result. This is a known issue in vLLM v1 engine (V1 AsyncLLM) for n>1 sampling, which was fixed in a later release. The bug caused only one completion to be returned even when n>1 was set, as described in this bug report and fixed in this pull request.

To get multiple completions, upgrade to a vLLM version that includes the fix (after PR #16863). With the fix, final_output.outputs will contain n completions as expected.

Would you like more detail on the underlying cause or how to upgrade?

Sources:

tense_chen · July 21, 2025, 7:22am

I have used the latest VLLM version, but I still have this issue

RunLLM · July 21, 2025, 7:22am

If you are using the latest vLLM version and still only receive one output when setting n>1, it may be model-specific. For example, some Qwen2.5-VL models had this issue, while other models worked as expected. This was confirmed in this issue, where switching to a different model resolved the problem.

If you are using a Qwen2.5-VL model, try switching to a non-VL variant or another supported model to see if multiple outputs are returned. If the issue persists with other models, please ensure you are running the latest nightly build, as the fix may not be in the stable release yet.

Would you like more troubleshooting steps or help with a specific model?

Sources:

API Returns Only Single Result Despite n=8 Parameter Setting

Anders-hh · January 13, 2026, 9:30pm

Have you solved it, bro? I’m running into this issue too.

Topic		Replies	Views
Which version of vLLM supports the best_of parameter in the OpenAI-compatible API? General	1	45	November 26, 2025
How to pass vLLM specific parameters via openai API from clients General	2	197	June 18, 2025
How to to set sample params default for all requests like "truncate_prompt_tokens" to vllm embedding General	1	29	April 12, 2026
Num request running stays on 1 Benchmarking	3	274	August 29, 2025
Transformers `do_sample=False` vs SamplingParms `temperature=0` gives different results Benchmarking	1	491	November 15, 2025

Why doesn't the parameter n in samplingparams work as expected

Related topics