How does vllm serve write inference results to the log?

qt2139 · July 21, 2025, 12:59pm

RunLLM · July 21, 2025, 1:00pm

By default, vLLM’s backend logs only the incoming requests, not the model’s inference results. To log model outputs in real time, you must manually modify the code—specifically, add logging statements to the chat completion generator functions (chat_completion_full_generator and/or chat_completion_stream_generator) in vllm/entrypoints/openai/serving_chat.py as the output is not logged by default in the current implementation. There is no built-in configuration or environment variable to enable output logging automatically at this time, though there is an open pull request to add this feature in the future.

For more flexible or advanced logging (such as custom formats or destinations), you can configure the vLLM logger using environment variables and a custom logging config file, but this still will not log model outputs unless you modify the code as described above. See the official logging configuration guide for details on customizing loggers, but note that output logging requires code changes as per the current state of the project. Would you like more detail or a code example?

Sources:

How to make model response information appear in the vllm backend logs

Topic		Replies	Views
Vllm日志中显示大模型的生成结果 General	1	245	July 4, 2025
How does vllm serve record input? General	2	14	September 22, 2025
Text generation doesn't stop General	1	207	April 8, 2025
How log kvcache usage and prefix hit rate in offline infer? General	19	282	May 13, 2025
Customized Model Generation Verification General	1	22	July 14, 2025

How does vllm serve write inference results to the log?

Related topics