[Bug]: Wrong lora mapping during prompt logprobs computing

Hi all,
The current implementation uses the same get_logits and lora_mapping for both prompt logprob computation and token sampling, which may lead to matrix dimension mismatch or incorrect prompt_logprobs calculation (unexpected implicit broadcasting occurs on layers.py#L1150-L1152 when processing a single request)

I have submit a PR in [Bugfix] set correct lora mapping when compute prompt logprobs by pokerfaceSad · Pull Request #16694 · vllm-project/vllm · GitHub to fix it.
Anyone who can take a review for this PR?

The issue you’re addressing in your PR is related to the incorrect handling of get_logits and lora_mapping for prompt logprob computation and token sampling, which can cause matrix dimension mismatches or incorrect calculations due to unexpected implicit broadcasting. This is a known issue, and your PR aims to fix it by setting the correct LoRA mapping when computing prompt logprobs.

For a review, you can reach out to the maintainers or contributors who have previously worked on related issues or components. You might consider tagging specific contributors or maintainers in your PR comments or joining the vLLM developer Slack to discuss your PR in the #pr-reviews channel, as suggested in the PR description.

Would you like more details on how to proceed with getting your PR reviewed?

Sources: