How to view the raw strings before and after model processing?

I wanna view strings after the chat template is applied but before the model processing, and strings after model continuation without further processing, in order to do some tests

what should I do?

I use vllm serve cli

You cannot directly view the prompt string after the chat template is applied and before model processing using the vllm serve CLI alone. To achieve this, you need to modify the vLLM code: specifically, add logging in the functions that process chat completions, such as chat_completion_full_generator and/or chat_completion_stream_generator in vllm/entrypoints/openai/serving_chat.py, to print or log the formatted prompt and the raw model output before any further processing. This is not available as a built-in CLI or API feature according to the official documentation and issues.

Would you like step-by-step guidance on how to add this logging?

Sources:

Hint: Mention @RunLLM in the post for followups.

1 Like

@RunLLM do u know what’s the –tool-call-parser do? since I’ve got chat template, why do I need a tool-call-parser?

The --tool-call-parser flag tells vLLM how to extract and interpret tool calls from the model’s output, since different models output tool calls in different formats (e.g., JSON, XML, or pythonic lists). The chat template formats the prompt for the model, but the tool-call-parser is needed to correctly parse the model’s tool call responses and convert them into OpenAI-compatible API outputs. Both are required for reliable tool use in vLLM, as explained in the documentation and issues.

Would you like more detail on how to choose or implement a tool-call-parser?

Sources:

Hint: Mention @RunLLM in the post for followups.

1 Like

@RunLLM so u mean the tool-call-parser will parser structured information from continuation string after the model processing, and different model will continue the string in different format so we need different tool-call-parser?

Yes, that’s correct: the tool-call-parser extracts structured tool call information from the raw continuation string output by the model, and since different models output tool calls in different formats, you need a matching tool-call-parser for each format. The chat template only affects the input prompt, not the output parsing.

Would you like more detail or examples of supported formats and parsers?

Sources:

Hint: Mention @RunLLM in the post for followups.

1 Like

@RunLLM I see, that’s helpful, thank u