How to view the raw strings before and after model processing?

Septemberlemon · November 17, 2025, 9:35am

I wanna view strings after the chat template is applied but before the model processing, and strings after model continuation without further processing, in order to do some tests

what should I do?

I use vllm serve cli

RunLLM · November 17, 2025, 9:35am

You cannot directly view the prompt string after the chat template is applied and before model processing using the vllm serve CLI alone. To achieve this, you need to modify the vLLM code: specifically, add logging in the functions that process chat completions, such as chat_completion_full_generator and/or chat_completion_stream_generator in vllm/entrypoints/openai/serving_chat.py, to print or log the formatted prompt and the raw model output before any further processing. This is not available as a built-in CLI or API feature according to the official documentation and issues.

Would you like step-by-step guidance on how to add this logging?

Sources:

Hint: Mention @RunLLM in the post for followups.

Septemberlemon · November 17, 2025, 9:39am

@RunLLM do u know what’s the –tool-call-parser do? since I’ve got chat template, why do I need a tool-call-parser?

RunLLM · November 17, 2025, 9:39am

The --tool-call-parser flag tells vLLM how to extract and interpret tool calls from the model’s output, since different models output tool calls in different formats (e.g., JSON, XML, or pythonic lists). The chat template formats the prompt for the model, but the tool-call-parser is needed to correctly parse the model’s tool call responses and convert them into OpenAI-compatible API outputs. Both are required for reliable tool use in vLLM, as explained in the documentation and issues.

Would you like more detail on how to choose or implement a tool-call-parser?

Sources:

Hint: Mention @RunLLM in the post for followups.

Septemberlemon · November 17, 2025, 9:45am

@RunLLM so u mean the tool-call-parser will parser structured information from continuation string after the model processing, and different model will continue the string in different format so we need different tool-call-parser?

RunLLM · November 17, 2025, 9:46am

Yes, that’s correct: the tool-call-parser extracts structured tool call information from the raw continuation string output by the model, and since different models output tool calls in different formats, you need a matching tool-call-parser for each format. The chat template only affects the input prompt, not the output parsing.

Would you like more detail or examples of supported formats and parsers?

Sources:

GitHub Issue: How to make use of vLLM tools functionality

Hint: Mention @RunLLM in the post for followups.

Septemberlemon · November 17, 2025, 9:47am

@RunLLM I see, that’s helpful, thank u

Topic		Replies	Views
How does `tool_choice` effect vllm? General	13	506	November 18, 2025
Prompt_embeds usage in vllm openai completion api Multi-modality	4	179	June 17, 2025
How to view the original request in v1 General	0	69	October 26, 2025
Does vllm automatically inject schema's information into the prompt? General	3	70	December 23, 2025
Raw tokens completion via online serving Features	1	95	November 3, 2025

How to view the raw strings before and after model processing?

Related topics