[Bug]: Wrong lora mapping during prompt logprobs computing

XinYuan · April 21, 2025, 8:02am

Hi all,
The current implementation uses the same get_logits and lora_mapping for both prompt logprob computation and token sampling, which may lead to matrix dimension mismatch or incorrect prompt_logprobs calculation (unexpected implicit broadcasting occurs on layers.py#L1150-L1152 when processing a single request)

I have submit a PR in [Bugfix] set correct lora mapping when compute prompt logprobs by pokerfaceSad · Pull Request #16694 · vllm-project/vllm · GitHub to fix it.
Anyone who can take a review for this PR?

RunLLM · April 21, 2025, 8:03am

The issue you’re addressing in your PR is related to the incorrect handling of get_logits and lora_mapping for prompt logprob computation and token sampling, which can cause matrix dimension mismatches or incorrect calculations due to unexpected implicit broadcasting. This is a known issue, and your PR aims to fix it by setting the correct LoRA mapping when computing prompt logprobs.

For a review, you can reach out to the maintainers or contributors who have previously worked on related issues or components. You might consider tagging specific contributors or maintainers in your PR comments or joining the vLLM developer Slack to discuss your PR in the #pr-reviews channel, as suggested in the PR description.

Would you like more details on how to proceed with getting your PR reviewed?

Sources:

PR description

Topic		Replies	Views
Numerical Difference between vLLM logprobs and huggingface logprobs RL Integration	7	187	April 4, 2025
vLLM output vs Ollama General	8	107	April 10, 2025
LoRA Adapter enabling with vLLM is not working LoRA	4	38	April 21, 2025
Difference in Log Probabilities Between vLLM and HF Model in Identical Environment General	6	117	March 26, 2025
Irrelevant Responses with Unsloth Fine-tuned Llama 3.1 8B using vLLM General	3	29	April 10, 2025

[Bug]: Wrong lora mapping during prompt logprobs computing

Related topics