Vllm bench serve Order of "generated_texts"

ymoslem · October 5, 2025, 12:13pm

I think I have found the issue and potential solutions. In the benchmark datasets code, these lines shuffle the prompts before sending as requests to the server:

random.seed(self.random_seed)
random.shuffle(self.data)

As I wanted to use the generated_texts for further evaluation comparing them to the original data, we can apply one of these three potential solutions:

Comment out these two lines;
Apply the same seed on the data before evaluation to obtain the same order;
Modify the load_data() function in the evaluation script. As I use the CustomDataset (i.e --dataset-name custom) for loading a JSON file, this can look as follows:

from vllm.benchmarks.datasets import CustomDataset

def patched_load_data(self):
    """Patched version of load_data that doesn't shuffle for evaluation."""
    if self.dataset_path is None:
        raise ValueError("dataset_path must be provided for loading data.")

    self.data = []

    if self.dataset_path.endswith(".jsonl"):
        jsonl_data = pd.read_json(path_or_buf=self.dataset_path,
                                  lines=True)

        if "prompt" not in jsonl_data.columns:
            raise ValueError("JSONL file must contain a 'prompt' column.")

        for _, row in jsonl_data.iterrows():
            self.data.append(row.to_dict())
    else:
        raise NotImplementedError("Only JSONL format is supported for CustomDataset.")
    
    # Remove shuffling for evaluation purposes
    # random.seed(self.random_seed)
    # random.shuffle(self.data)

# Apply the modification
CustomDataset.load_data = patched_load_data

Using either options 2 or 3 is cleaner than option 1 as they are not affected by vLLM updates.

@RunLLM

Topic		Replies	Views
Vllm bench serve not all requests are successful. whats the reason? Benchmarking	5	359	October 23, 2025
How to benchmark concurrency General	26	1596	September 15, 2025
Logprobs output from vllm bench serve Benchmarking	6	279	September 27, 2025
使用以下2种方式，获得的结果有很大差异 Benchmarking	50	1891	July 25, 2025
Vllm bench serve + Bearer API key + HTTPS Benchmarking	1	580	August 7, 2025

Vllm bench serve Order of "generated_texts"

Related topics