Performance Issue While Requests Queuing

@thiner How did you measure generation speed?
If the generation speed that you observed is the generation speed of only-one request. Then what you are observing is expected. As the vLLM server processing more requests, the generation speed of individual request drops, as the compute capacity is shared to process other requests in parallel.