On V1, a request was sent to the GPU, but its memory usage did not decrease and it kept increasing rapidly.
[loggers.py:123] Engine 000: Avg prompt throughput: 600.0 tokens/s, Avg generation throughput: 77.2 tokens/s, Running: 5 reqs, Waiting: 0 reqs, GPU KV cache usage: 11.7%, Prefix cache hit rate: 99.2%
[loggers.py:123] Engine 000: Avg prompt throughput: 1199.9 tokens/s, Avg generation throughput: 204.3 tokens/s, Running: 6 reqs, Waiting: 0 reqs, GPU KV cache usage: 24.6%, Prefix cache hit rate: 99.2%
[loggers.py:123] Engine 000: Avg prompt throughput: 1598.0 tokens/s, Avg generation throughput: 190.2 tokens/s, Running: 12 reqs, Waiting: 0 reqs, GPU KV cache usage: 38.8%, Prefix cache hit rate: 90.4%
[loggers.py:123] Engine 000: Avg prompt throughput: 1498.5 tokens/s, Avg generation throughput: 361.3 tokens/s, Running: 13 reqs, Waiting: 0 reqs, GPU KV cache usage: 50.8%, Prefix cache hit rate: 89.1%
[loggers.py:123] Engine 000: Avg prompt throughput: 1699.8 tokens/s, Avg generation throughput: 404.1 tokens/s, Running: 9 reqs, Waiting: 0 reqs, GPU KV cache usage: 70.0%, Prefix cache hit rate: 90.2%
[loggers.py:123] Engine 000: Avg prompt throughput: 2297.1 tokens/s, Avg generation throughput: 422.2 tokens/s, Running: 10 reqs, Waiting: 0 reqs, GPU KV cache usage: 94.9%, Prefix cache hit rate: 92.5%
[loggers.py:123] Engine 000: Avg prompt throughput: 800.0 tokens/s, Avg generation throughput: 255.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 99.3%, Prefix cache hit rate: 93.1%