The referenced speedups (1.05x–1.9x) for torch.compile in vLLM were measured using the vLLM benchmark suite, specifically with models like Llama3, Llama4, and Mixtral on NVIDIA H100 and AMD MI300x hardware. The benchmarks compare vLLM with torch.compile enabled (the V1 engine, where torch.compile is on by default) versus vLLM without torch.compile (V0 engine or with --enforce-eager). The vllm-benchmark workflow is used for validation, and results are published on the vLLM benchmark dashboard for transparency and reproducibility. Input shapes typically use batch sizes and sequence lengths relevant to LLM serving (e.g., input/output lengths of 128–256 tokens, batch sizes up to 256+), but exact shapes can be found in the benchmark scripts.