Is DeepSeek R1-0528 function call chat template supported BFCL multi turn benchmark?

I try to run BFCL_v3_MultiTurn benchmark in FC mode with DeepSeek R1-0528.
However, the results are not ideal compared to the deepseek official outcomes,which BFCL_v3_MultiTurn (Acc) is 37.0%.
Is DeepSeek R1-0528 function call chat template supported BFCL multi turn benchmark?
image

DeepSeek R1-0528 is supported for function call (FC) benchmarks in vLLM, and there is an official function call chat template for it. However, multiple users have reported that, while single-turn function call accuracy is high, the model fails most multi-turn BFCL_v3_MultiTurn benchmark cases, with accuracy as low as 4–6%—much lower than DeepSeek’s official results. This is a known issue and has been observed in both vLLM and community evaluations, even when using the recommended template and parser (see discussion and template usage here, issue report).

In summary: DeepSeek R1-0528’s function call template is supported for BFCL multi-turn benchmarks in vLLM, but multi-turn accuracy is currently poor and does not match DeepSeek’s official outcomes. This appears to be a limitation of the current vLLM template/parser or model behavior, not a lack of support. Would you like a step-by-step breakdown of the setup or more detail on the known issues?

Sources: