Has anyone used InternVL3 to perform OCR tasks on vLLM? Our tests show that the same InternVL3 model performs significantly worse on vLLM than on LMDeploy. Both are executed as command lines with parameters as consistent as possible. We’ve also checked the `chat_template` setting and confirmed it’s loaded correctly. Are there any other possible reasons?
Yes, this issue has been reported: InternVL3 models (e.g., InternVL3-14B) showed significantly lower accuracy on vLLM compared to LMDeploy, even when using the same model and matching parameters. The root cause was previously traced to an incorrect chat template in the Hugging Face tokenizer config, but this has since been fixed by the model authors. If you have already verified the chat template is correct and up to date, there are no other explicit causes documented in the data sources for such a large accuracy gap between vLLM and LMDeploy for OCR tasks with InternVL3. No additional vLLM-specific settings or bugs are mentioned as responsible for this issue after the chat template fix, and no further troubleshooting steps are detailed in the sources.
Would you like a step-by-step breakdown of the issue history and troubleshooting process?
Sources:
Hint: Mention @RunLLM in the post for followups.
We have seen this issue and have confirmed that we have already downloaded the updated chat_template.