How does vllm serve write inference results to the log?

How does vllm serve write inference results to the log?

By default, vLLM’s backend logs only the incoming requests, not the model’s inference results. To log model outputs in real time, you must manually modify the code—specifically, add logging statements to the chat completion generator functions (chat_completion_full_generator and/or chat_completion_stream_generator) in vllm/entrypoints/openai/serving_chat.py as the output is not logged by default in the current implementation. There is no built-in configuration or environment variable to enable output logging automatically at this time, though there is an open pull request to add this feature in the future.

For more flexible or advanced logging (such as custom formats or destinations), you can configure the vLLM logger using environment variables and a custom logging config file, but this still will not log model outputs unless you modify the code as described above. See the official logging configuration guide for details on customizing loggers, but note that output logging requires code changes as per the current state of the project. Would you like more detail or a code example?

Sources: