|
About the Benchmarking category
|
|
0
|
56
|
March 20, 2025
|
|
On prefill-only, kv_cache_usage_perc reaches max 0.32?
|
|
1
|
16
|
February 3, 2026
|
|
Benchmark for flash_attention
|
|
4
|
43
|
January 22, 2026
|
|
How to get the log for benchmarking
|
|
17
|
51
|
January 19, 2026
|
|
Transformers `do_sample=False` vs SamplingParms `temperature=0` gives different results
|
|
1
|
321
|
November 15, 2025
|
|
VLLM 0.10.1 benchmark do not free memory
|
|
13
|
124
|
November 10, 2025
|
|
Vllm bench serve not all requests are successful. whats the reason?
|
|
5
|
189
|
October 23, 2025
|
|
How can I disable the model forward pass to measure host-only (CPU) overhead?
|
|
5
|
59
|
October 21, 2025
|
|
Vllm bench serve Order of "generated_texts"
|
|
16
|
110
|
October 6, 2025
|
|
Logprobs output from vllm bench serve
|
|
6
|
175
|
September 27, 2025
|
|
Running vllm bench serve from CPU-only node
|
|
3
|
593
|
August 29, 2025
|
|
Num request running stays on 1
|
|
3
|
219
|
August 29, 2025
|
|
Mixedbread reranker on vLLM `/score`: scores differ vs local Mixedbread; small payload = same order/different scores, large payload = different order
|
|
1
|
53
|
August 15, 2025
|
|
Vllm bench serve + Bearer API key + HTTPS
|
|
1
|
323
|
August 7, 2025
|
|
使用以下2种方式,获得的结果有很大差异
|
|
50
|
1344
|
July 25, 2025
|
|
High-Throughput kernel on single-node
|
|
1
|
143
|
June 23, 2025
|
|
VLLM Engine Metrics
|
|
20
|
318
|
June 11, 2025
|
|
vLLM benchmark host with self-signed certificate
|
|
1
|
222
|
June 4, 2025
|
|
ShareGPT implementation
|
|
1
|
603
|
May 22, 2025
|