Vllm推理问题，推理结果完全不对

longkong39 · February 4, 2026, 8:46am

请问为什么我使用vllm推理和使用modelscope库进行推理差距会这么大，使用vllm推理的回复完全不对
vllm推理代码

from vllm import LLM, SamplingParams

llm = LLM(model=“models/Qwen/Qwen2.5-1.5B-Instruct”)  

sampling_params = SamplingParams(
temperature=0.7,
top_p=0.8,
top_k=20,
max_tokens=256
)

outputs = llm.generate(“你好”)
for output in outputs:
print(output.outputs[0].text)

结果：

，我需要预订一张去纽约的机票。请问有什么好的建议吗？

modelscope推理代码

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "models/Qwen/Qwen2.5-1.5B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "你好"
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

结果：

你好！有什么我可以帮助你的吗？

RunLLM · February 4, 2026, 8:46am

主要原因是你在用vLLM时，直接将"你好"作为prompt输入，没有应用模型的chat模板，而modelscope（transformers）推理时用tokenizer.apply_chat_template对输入做了格式化。对于Qwen等指令/对话模型，必须用chat模板包装输入，否则模型无法正确理解上下文，导致输出异常或无关内容。
根据vLLM官方文档和相关issue，你应先用tokenizer的apply_chat_template处理prompt，再传给vLLM的generate方法，或直接用llm.chat接口。

是否需要详细代码示例和原理说明？

Sources:

Hint: Mention @RunLLM in the post for followups.

longkong39 · February 4, 2026, 8:56am

确实是这个原因，回复非常迅速，非常感谢！

Topic		Replies	Views
Two different runs give different answers General	3	258	November 26, 2025
调用vllm的python接口，推理Qwen3-VL模型 General	13	431	December 5, 2025
vLLM output vs Ollama General	8	663	April 10, 2025
Customized Model Generation Verification General	1	70	July 14, 2025
Numerical Difference between vLLM logprobs and huggingface logprobs RL Integration	7	4488	April 4, 2025

Vllm推理问题，推理结果完全不对

Related topics