Text generation doesn't stop

shaun-monera · March 28, 2025, 6:07am

I’m running vllm serve v0.8.2 with open webui. It runs well but I noticed on the logs the vllm is still generating tokens even though nothing the request is already done.

Here’s my configuration:

vllm/vllm-openai:v0.8.2 image
Llama-3.1-8B-Instruct
NVIDIA RTX 5000 ADA 16GB
running with the following options
–trust-remote-code
–dtype float16
-q bitsandbytes
–load-format bitsandbytes
–max-model-len 61000
–chat-template /vllm workspace/examples/tool_chat_template_llama3.1_json.jinja

I’m using Package open-webui · GitHub

hmellor · April 8, 2025, 10:47pm

Could you share a sample of these logs that indicate that vLLM is still generating?

Topic		Replies	Views
Irrelevant Responses with Unsloth Fine-tuned Llama 3.1 8B using vLLM General	3	49	April 10, 2025
Struggling with my dual GPU setup. And getting chat template errors NVIDIA GPU Support	2	11	May 30, 2025
vLLM output vs Ollama General	8	192	April 10, 2025
Why vllm cannot fully use GPU in batch processing General	12	148	March 29, 2025
Multimodal inference guideline? General	13	25	May 27, 2025

Text generation doesn't stop

Related topics