使用以下2种方式,获得的结果有很大差异

第1种方式的结果:10次结果都很稳定,TTFT在100ms左右。

root@2f77277da063:/vllm-workspace# for i in {1..10}; do
> python3 /vllm-workspace/benchmarks/benchmark_serving.py     --backend openai-chat     --model /data/models/Qwen2.5-72B     --served-model-name Qwen2.5-72B     --endpoint /v1/chat/completions     --port 8080     --dataset_name random     --random-input-len 7000     --random-output-len 3000     --random-range-ratio 0.1     --num-prompts 1     --max-concurrency 1
> done
INFO 06-12 01:08:18 [__init__.py:239] Automatically detected platform cuda.
Namespace(backend='openai-chat', base_url=None, host='127.0.0.1', port=8080, endpoint='/v1/chat/completions', dataset_name='random', dataset_path=None, max_concurrency=1, model='/data/models/Qwen2.5-72B', tokenizer=None, use_beam_search=False, num_prompts=1, logprobs=None, request_rate=inf, burstiness=1.0, seed=0, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, save_detailed=False, append_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, random_input_len=7000, random_output_len=3000, random_range_ratio=0.1, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, top_p=None, top_k=None, min_p=None, temperature=None, tokenizer_mode='auto', served_model_name='Qwen2.5-72B', lora_modules=None)
Starting initial single prompt test run...
Initial test run completed. Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.48s/it]
============ Serving Benchmark Result ============
Successful requests:                     1         
Benchmark duration (s):                  3.48      
Total input tokens:                      6984      
Total generated tokens:                  76        
Request throughput (req/s):              0.29      
Output token throughput (tok/s):         21.81     
Total Token throughput (tok/s):          2026.39   
---------------Time to First Token----------------
Mean TTFT (ms):                          101.93    
Median TTFT (ms):                        101.93    
P99 TTFT (ms):                           101.93    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          45.07     
Median TPOT (ms):                        45.07     
P99 TPOT (ms):                           45.07     
---------------Inter-token Latency----------------
Mean ITL (ms):                           44.47     
Median ITL (ms):                         45.01     
P99 ITL (ms):                            46.96     
==================================================
INFO 06-12 01:08:37 [__init__.py:239] Automatically detected platform cuda.
Namespace(backend='openai-chat', base_url=None, host='127.0.0.1', port=8080, endpoint='/v1/chat/completions', dataset_name='random', dataset_path=None, max_concurrency=1, model='/data/models/Qwen2.5-72B', tokenizer=None, use_beam_search=False, num_prompts=1, logprobs=None, request_rate=inf, burstiness=1.0, seed=0, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, save_detailed=False, append_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, random_input_len=7000, random_output_len=3000, random_range_ratio=0.1, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, top_p=None, top_k=None, min_p=None, temperature=None, tokenizer_mode='auto', served_model_name='Qwen2.5-72B', lora_modules=None)
Starting initial single prompt test run...
Initial test run completed. Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: 1
  0%|                                                                                                                                                                             | 0/1 [00:00<?, ?100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.49100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.49s/it]
============ Serving Benchmark Result ============
Successful requests:                     1         
Benchmark duration (s):                  3.49      
Total input tokens:                      6984      
Total generated tokens:                  76        
Request throughput (req/s):              0.29      
Output token throughput (tok/s):         21.77     
Total Token throughput (tok/s):          2022.51   
---------------Time to First Token----------------
Mean TTFT (ms):                          106.18    
Median TTFT (ms):                        106.18    
P99 TTFT (ms):                           106.18    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          45.10     
Median TPOT (ms):                        45.10     
P99 TPOT (ms):                           45.10     
---------------Inter-token Latency----------------
Mean ITL (ms):                           44.51     
Median ITL (ms):                         44.95     
P99 ITL (ms):                            47.90     
==================================================
INFO 06-12 01:08:50 [__init__.py:239] Automatically detected platform cuda.
Namespace(backend='openai-chat', base_url=None, host='127.0.0.1', port=8080, endpoint='/v1/chat/completions', dataset_name='random', dataset_path=None, max_concurrency=1, model='/data/models/Qwen2.5-72B', tokenizer=None, use_beam_search=False, num_prompts=1, logprobs=None, request_rate=inf, burstiness=1.0, seed=0, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, save_detailed=False, append_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, random_input_len=7000, random_output_len=3000, random_range_ratio=0.1, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, top_p=None, top_k=None, min_p=None, temperature=None, tokenizer_mode='auto', served_model_name='Qwen2.5-72B', lora_modules=None)
Starting initial single prompt test run...
Initial test run completed. Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.48s/it]
============ Serving Benchmark Result ============
Successful requests:                     1         
Benchmark duration (s):                  3.48      
Total input tokens:                      6984      
Total generated tokens:                  76        
Request throughput (req/s):              0.29      
Output token throughput (tok/s):         21.82     
Total Token throughput (tok/s):          2026.85   
---------------Time to First Token----------------
Mean TTFT (ms):                          105.31    
Median TTFT (ms):                        105.31    
P99 TTFT (ms):                           105.31    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          45.02     
Median TPOT (ms):                        45.02     
P99 TPOT (ms):                           45.02     
---------------Inter-token Latency----------------
Mean ITL (ms):                           44.42     
Median ITL (ms):                         44.98     
P99 ITL (ms):                            45.55     
==================================================
INFO 06-12 01:09:04 [__init__.py:239] Automatically detected platform cuda.
Namespace(backend='openai-chat', base_url=None, host='127.0.0.1', port=8080, endpoint='/v1/chat/completions', dataset_name='random', dataset_path=None, max_concurrency=1, model='/data/models/Qwen2.5-72B', tokenizer=None, use_beam_search=False, num_prompts=1, logprobs=None, request_rate=inf, burstiness=1.0, seed=0, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, save_detailed=False, append_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, random_input_len=7000, random_output_len=3000, random_range_ratio=0.1, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, top_p=None, top_k=None, min_p=None, temperature=None, tokenizer_mode='auto', served_model_name='Qwen2.5-72B', lora_modules=None)
Starting initial single prompt test run...
Initial test run completed. Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.49s/it]
============ Serving Benchmark Result ============
Successful requests:                     1         
Benchmark duration (s):                  3.49      
Total input tokens:                      6984      
Total generated tokens:                  76        
Request throughput (req/s):              0.29      
Output token throughput (tok/s):         21.78     
Total Token throughput (tok/s):          2023.20   
---------------Time to First Token----------------
Mean TTFT (ms):                          103.25    
Median TTFT (ms):                        103.25    
P99 TTFT (ms):                           103.25    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          45.13     
Median TPOT (ms):                        45.13     
P99 TPOT (ms):                           45.13     
---------------Inter-token Latency----------------
Mean ITL (ms):                           44.54     
Median ITL (ms):                         45.06     
P99 ITL (ms):                            47.85     
==================================================
INFO 06-12 01:09:18 [__init__.py:239] Automatically detected platform cuda.
Namespace(backend='openai-chat', base_url=None, host='127.0.0.1', port=8080, endpoint='/v1/chat/completions', dataset_name='random', dataset_path=None, max_concurrency=1, model='/data/models/Qwen2.5-72B', tokenizer=None, use_beam_search=False, num_prompts=1, logprobs=None, request_rate=inf, burstiness=1.0, seed=0, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, save_detailed=False, append_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, random_input_len=7000, random_output_len=3000, random_range_ratio=0.1, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, top_p=None, top_k=None, min_p=None, temperature=None, tokenizer_mode='auto', served_model_name='Qwen2.5-72B', lora_modules=None)
Starting initial single prompt test run...
Initial test run completed. Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.49s/it]
============ Serving Benchmark Result ============
Successful requests:                     1         
Benchmark duration (s):                  3.49      
Total input tokens:                      6984      
Total generated tokens:                  76        
Request throughput (req/s):              0.29      
Output token throughput (tok/s):         21.79     
Total Token throughput (tok/s):          2024.62   
---------------Time to First Token----------------
Mean TTFT (ms):                          100.20    
Median TTFT (ms):                        100.20    
P99 TTFT (ms):                           100.20    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          45.13     
Median TPOT (ms):                        45.13     
P99 TPOT (ms):                           45.13     
---------------Inter-token Latency----------------
Mean ITL (ms):                           44.54     
Median ITL (ms):                         45.06     
P99 ITL (ms):                            46.50     
==================================================
INFO 06-12 01:09:32 [__init__.py:239] Automatically detected platform cuda.
Namespace(backend='openai-chat', base_url=None, host='127.0.0.1', port=8080, endpoint='/v1/chat/completions', dataset_name='random', dataset_path=None, max_concurrency=1, model='/data/models/Qwen2.5-72B', tokenizer=None, use_beam_search=False, num_prompts=1, logprobs=None, request_rate=inf, burstiness=1.0, seed=0, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, save_detailed=False, append_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, random_input_len=7000, random_output_len=3000, random_range_ratio=0.1, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, top_p=None, top_k=None, min_p=None, temperature=None, tokenizer_mode='auto', served_model_name='Qwen2.5-72B', lora_modules=None)
Starting initial single prompt test run...
Initial test run completed. Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.53s/it]
============ Serving Benchmark Result ============
Successful requests:                     1         
Benchmark duration (s):                  3.54      
Total input tokens:                      6984      
Total generated tokens:                  76        
Request throughput (req/s):              0.28      
Output token throughput (tok/s):         21.50     
Total Token throughput (tok/s):          1997.10   
---------------Time to First Token----------------
Mean TTFT (ms):                          112.50    
Median TTFT (ms):                        112.50    
P99 TTFT (ms):                           112.50    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          45.61     
Median TPOT (ms):                        45.61     
P99 TPOT (ms):                           45.61     
---------------Inter-token Latency----------------
Mean ITL (ms):                           45.01     
Median ITL (ms):                         45.05     
P99 ITL (ms):                            55.07     
==================================================
INFO 06-12 01:09:46 [__init__.py:239] Automatically detected platform cuda.
Namespace(backend='openai-chat', base_url=None, host='127.0.0.1', port=8080, endpoint='/v1/chat/completions', dataset_name='random', dataset_path=None, max_concurrency=1, model='/data/models/Qwen2.5-72B', tokenizer=None, use_beam_search=False, num_prompts=1, logprobs=None, request_rate=inf, burstiness=1.0, seed=0, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, save_detailed=False, append_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, random_input_len=7000, random_output_len=3000, random_range_ratio=0.1, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, top_p=None, top_k=None, min_p=None, temperature=None, tokenizer_mode='auto', served_model_name='Qwen2.5-72B', lora_modules=None)
Starting initial single prompt test run...
Initial test run completed. Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.49s/it]
============ Serving Benchmark Result ============
Successful requests:                     1         
Benchmark duration (s):                  3.49      
Total input tokens:                      6984      
Total generated tokens:                  76        
Request throughput (req/s):              0.29      
Output token throughput (tok/s):         21.79     
Total Token throughput (tok/s):          2024.17   
---------------Time to First Token----------------
Mean TTFT (ms):                          95.27     
Median TTFT (ms):                        95.27     
P99 TTFT (ms):                           95.27     
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          45.21     
Median TPOT (ms):                        45.21     
P99 TPOT (ms):                           45.21     
---------------Inter-token Latency----------------
Mean ITL (ms):                           44.61     
Median ITL (ms):                         45.06     
P99 ITL (ms):                            48.05     
==================================================
INFO 06-12 01:10:00 [__init__.py:239] Automatically detected platform cuda.
Namespace(backend='openai-chat', base_url=None, host='127.0.0.1', port=8080, endpoint='/v1/chat/completions', dataset_name='random', dataset_path=None, max_concurrency=1, model='/data/models/Qwen2.5-72B', tokenizer=None, use_beam_search=False, num_prompts=1, logprobs=None, request_rate=inf, burstiness=1.0, seed=0, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, save_detailed=False, append_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, random_input_len=7000, random_output_len=3000, random_range_ratio=0.1, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, top_p=None, top_k=None, min_p=None, temperature=None, tokenizer_mode='auto', served_model_name='Qwen2.5-72B', lora_modules=None)
Starting initial single prompt test run...
Initial test run completed. Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.50s/it]
============ Serving Benchmark Result ============
Successful requests:                     1         
Benchmark duration (s):                  3.50      
Total input tokens:                      6984      
Total generated tokens:                  76        
Request throughput (req/s):              0.29      
Output token throughput (tok/s):         21.70     
Total Token throughput (tok/s):          2016.01   
---------------Time to First Token----------------
Mean TTFT (ms):                          108.49    
Median TTFT (ms):                        108.49    
P99 TTFT (ms):                           108.49    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          45.22     
Median TPOT (ms):                        45.22     
P99 TPOT (ms):                           45.22     
---------------Inter-token Latency----------------
Mean ITL (ms):                           44.63     
Median ITL (ms):                         45.11     
P99 ITL (ms):                            47.00     
==================================================
INFO 06-12 01:10:14 [__init__.py:239] Automatically detected platform cuda.
Namespace(backend='openai-chat', base_url=None, host='127.0.0.1', port=8080, endpoint='/v1/chat/completions', dataset_name='random', dataset_path=None, max_concurrency=1, model='/data/models/Qwen2.5-72B', tokenizer=None, use_beam_search=False, num_prompts=1, logprobs=None, request_rate=inf, burstiness=1.0, seed=0, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, save_detailed=False, append_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, random_input_len=7000, random_output_len=3000, random_range_ratio=0.1, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, top_p=None, top_k=None, min_p=None, temperature=None, tokenizer_mode='auto', served_model_name='Qwen2.5-72B', lora_modules=None)
Starting initial single prompt test run...
Initial test run completed. Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.49s/it]
============ Serving Benchmark Result ============
Successful requests:                     1         
Benchmark duration (s):                  3.49      
Total input tokens:                      6984      
Total generated tokens:                  76        
Request throughput (req/s):              0.29      
Output token throughput (tok/s):         21.78     
Total Token throughput (tok/s):          2023.67   
---------------Time to First Token----------------
Mean TTFT (ms):                          103.82    
Median TTFT (ms):                        103.82    
P99 TTFT (ms):                           103.82    
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          45.11     
Median TPOT (ms):                        45.11     
P99 TPOT (ms):                           45.11     
---------------Inter-token Latency----------------
Mean ITL (ms):                           44.51     
Median ITL (ms):                         45.05     
P99 ITL (ms):                            45.78     
==================================================
INFO 06-12 01:10:27 [__init__.py:239] Automatically detected platform cuda.
Namespace(backend='openai-chat', base_url=None, host='127.0.0.1', port=8080, endpoint='/v1/chat/completions', dataset_name='random', dataset_path=None, max_concurrency=1, model='/data/models/Qwen2.5-72B', tokenizer=None, use_beam_search=False, num_prompts=1, logprobs=None, request_rate=inf, burstiness=1.0, seed=0, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, save_detailed=False, append_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, random_input_len=7000, random_output_len=3000, random_range_ratio=0.1, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, top_p=None, top_k=None, min_p=None, temperature=None, tokenizer_mode='auto', served_model_name='Qwen2.5-72B', lora_modules=None)
Starting initial single prompt test run...
Initial test run completed. Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: 1
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:03<00:00,  3.49s/it]
============ Serving Benchmark Result ============
Successful requests:                     1         
Benchmark duration (s):                  3.49      
Total input tokens:                      6984      
Total generated tokens:                  76        
Request throughput (req/s):              0.29      
Output token throughput (tok/s):         21.75     
Total Token throughput (tok/s):          2020.42   
---------------Time to First Token----------------
Mean TTFT (ms):                          99.14     
Median TTFT (ms):                        99.14     
P99 TTFT (ms):                           99.14     
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          45.24     
Median TPOT (ms):                        45.24     
P99 TPOT (ms):                           45.24     
---------------Inter-token Latency----------------
Mean ITL (ms):                           44.65     
Median ITL (ms):                         45.08     
P99 ITL (ms):                            47.79     
==================================================
root@2f77277da063:/vllm-workspace# 

第2种方式的结果:TTFT已经到达了5000+ms

root@2f77277da063:/vllm-workspace# python3 /vllm-workspace/benchmarks/benchmark_serving.py     --backend openai-chat     --model /data/models/Qwen2.5-72B     --served-model-name Qwen2.5-72B     --endpoint /v1/chat/completions     --port 8080     --dataset_name random     --random-input-len 7000     --random-output-len 3000     --random-range-ratio 0.1     --num-prompts 10     --max-concurrency 1
INFO 06-12 01:14:01 [__init__.py:239] Automatically detected platform cuda.
Namespace(backend='openai-chat', base_url=None, host='127.0.0.1', port=8080, endpoint='/v1/chat/completions', dataset_name='random', dataset_path=None, max_concurrency=1, model='/data/models/Qwen2.5-72B', tokenizer=None, use_beam_search=False, num_prompts=10, logprobs=None, request_rate=inf, burstiness=1.0, seed=0, trust_remote_code=False, disable_tqdm=False, profile=False, save_result=False, save_detailed=False, append_result=False, metadata=None, result_dir=None, result_filename=None, ignore_eos=False, percentile_metrics='ttft,tpot,itl', metric_percentiles='99', goodput=None, sonnet_input_len=550, sonnet_output_len=150, sonnet_prefix_len=200, sharegpt_output_len=None, random_input_len=7000, random_output_len=3000, random_range_ratio=0.1, random_prefix_len=0, hf_subset=None, hf_split=None, hf_output_len=None, top_p=None, top_k=None, min_p=None, temperature=None, tokenizer_mode='auto', served_model_name='Qwen2.5-72B', lora_modules=None)
Starting initial single prompt test run...
Initial test run completed. Starting main benchmark run...
Traffic request rate: inf
Burstiness factor: 1.0 (Poisson process)
Maximum request concurrency: 1
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [01:33<00:00,  9.36s/it]
============ Serving Benchmark Result ============
Successful requests:                     10        
Benchmark duration (s):                  93.60     
Total input tokens:                      71443     
Total generated tokens:                  931       
Request throughput (req/s):              0.11      
Output token throughput (tok/s):         9.95      
Total Token throughput (tok/s):          773.21    
---------------Time to First Token----------------
Mean TTFT (ms):                          5188.91   
Median TTFT (ms):                        5623.70   
P99 TTFT (ms):                           6942.57   
-----Time per Output Token (excl. 1st token)------
Mean TPOT (ms):                          45.27     
Median TPOT (ms):                        45.23     
P99 TPOT (ms):                           45.53     
---------------Inter-token Latency----------------
Mean ITL (ms):                           44.79     
Median ITL (ms):                         45.14     
P99 ITL (ms):                            48.30     
==================================================
root@2f77277da063:/vllm-workspace# 

并发为1时,多个请求是按序逐个执行的吗(前一个请求结束,发起后一个请求)