Vllm bench serve Order of "generated_texts"

I think I have found the issue and potential solutions. In the benchmark datasets code, these lines shuffle the prompts before sending as requests to the server:

random.seed(self.random_seed)
random.shuffle(self.data)

As I wanted to use the generated_texts for further evaluation comparing them to the original data, we can apply one of these three potential solutions:

  1. Comment out these two lines;
  2. Apply the same seed on the data before evaluation to obtain the same order;
  3. Modify the load_data() function in the evaluation script. As I use the CustomDataset (i.e --dataset-name custom) for loading a JSON file, this can look as follows:
from vllm.benchmarks.datasets import CustomDataset

def patched_load_data(self):
    """Patched version of load_data that doesn't shuffle for evaluation."""
    if self.dataset_path is None:
        raise ValueError("dataset_path must be provided for loading data.")

    self.data = []

    if self.dataset_path.endswith(".jsonl"):
        jsonl_data = pd.read_json(path_or_buf=self.dataset_path,
                                  lines=True)

        if "prompt" not in jsonl_data.columns:
            raise ValueError("JSONL file must contain a 'prompt' column.")

        for _, row in jsonl_data.iterrows():
            self.data.append(row.to_dict())
    else:
        raise NotImplementedError("Only JSONL format is supported for CustomDataset.")
    
    # Remove shuffling for evaluation purposes
    # random.seed(self.random_seed)
    # random.shuffle(self.data)

# Apply the modification
CustomDataset.load_data = patched_load_data

Using either options 2 or 3 is cleaner than option 1 as they are not affected by vLLM updates.

@RunLLM