Text generation doesn't stop

I’m running vllm serve v0.8.2 with open webui. It runs well but I noticed on the logs the vllm is still generating tokens even though nothing the request is already done.

Here’s my configuration:

  • vllm/vllm-openai:v0.8.2 image
  • Llama-3.1-8B-Instruct
  • NVIDIA RTX 5000 ADA 16GB
  • running with the following options
    –trust-remote-code
    –dtype float16
    -q bitsandbytes
    –load-format bitsandbytes
    –max-model-len 61000
    –chat-template /vllm workspace/examples/tool_chat_template_llama3.1_json.jinja

I’m using Package open-webui · GitHub

Could you share a sample of these logs that indicate that vLLM is still generating?