How does vllm serve record input?

How does vllm serve record input?

vLLM serve records input by receiving requests through its OpenAI-compatible API server, which can be started using the vllm serve command. The server processes input data (such as prompts or messages) by passing them to the underlying AsyncLLMEngine, which handles tokenization, scheduling, and model execution. The input is typically provided as a prompt string or a structured message, and is processed internally for inference and output generation, but the documentation does not specify persistent logging or storage of input beyond request handling in memory during inference sessions.

If you are referring to how input is handled for inference, the LLMEngine (and AsyncLLMEngine) manages input processing, including tokenization and batching, before passing it to the model for generation. For more details, see the Architecture Overview and Quickstart documentation. There is no mention in the data sources of vLLM serve persisting or logging input data to disk by default.

Would you like more detail on the internal input processing or on how to customize input handling?

Sources:

No Received request after vllm v0.10.2