Why we need call generator.set_offset every step

daylight21 · May 22, 2025, 7:40am

vllm call this function in execute_model
generator.set_offset(generator.get_offset() - 4)

RunLLM · May 22, 2025, 7:40am

The function call generator.set_offset(generator.get_offset() - 4) is used in vLLM to rewind the state of a random number generator associated with a request when a sampled token should be ignored, such as during partial prefill. This ensures that the generator’s state is consistent as if the token was never sampled, which is important for reproducibility and correct sampling behavior in subsequent steps. This logic is present in both GPU and TPU model runners, specifically when handling requests that should not advance due to incomplete prefill or chunked prefill scenarios, as shown in the code and comments in gpu_model_runner.py and tpu_model_runner.py.

If you need more detail on how this interacts with the rest of the sampling or scheduling logic, let me know.

Sources:

Topic		Replies	Views
Vllm的seed是怎么做到动态随机的？ General	19	275	May 19, 2025
Text generation doesn't stop General	1	186	April 8, 2025
How to obtain the logits of LLM General	31	599	July 21, 2025
Does LLM.generate() differentiate between Prefill and Decode phases? Where to insert timers? General	5	36	July 24, 2025
How to pass vLLM specific parameters via openai API from clients General	2	45	June 18, 2025

Why we need call generator.set_offset every step

Related topics