Customized Model Generation Verification

excespo · July 14, 2025, 2:41am

BG: I customize a model apart from original Qwen2 model, with monkey patch some of the forward function. But now I’m encounted with a problem that my vllm output is quite different from that of huggingface, and I was not able to find out the bug by just reviewing the code. I’m looking forward to debug it by checking the intermediate hidden states per layer, but that seems not obvious under vllm (huggingface with output_hidden_states=True works)

Does anybody know an effective way to debug such case? Thanks for responding XD

RunLLM · July 14, 2025, 2:41am

vLLM does not natively support returning intermediate hidden states for each layer in the same way as HuggingFace’s output_hidden_states=True option. This limitation is due to performance concerns and architectural choices; vLLM is optimized for fast inference and does not accumulate or expose per-layer hidden states by default. There have been community discussions and PRs proposing this feature, but maintainers have not merged such functionality into mainline vLLM, and recommend using HuggingFace Transformers for this use case instead. If you need to debug by inspecting hidden states, you may need to modify vLLM’s model code directly (e.g., by adding hooks or custom print statements), or temporarily switch to HuggingFace for debugging purposes. See the discussion in vllm-project/vllm#20624 and vllm-project/vllm#6165.

Would you like more detail on how to patch vLLM to print or extract hidden states for debugging? Or do you want advice on alternative debugging strategies?

Sources:

Topic		Replies	Views
Numerical Difference between vLLM logprobs and huggingface logprobs RL Integration	7	3920	April 4, 2025
Difference in Log Probabilities Between vLLM and HF Model in Identical Environment General	6	614	March 26, 2025
Trace requests through vLLM v1 General	1	84	May 29, 2025
vLLM output vs Ollama General	8	442	April 10, 2025
Prompt_embeds usage in vllm openai completion api Multi-modality	4	70	June 17, 2025

Customized Model Generation Verification

Related topics