Tool calling using Offline Inference?

bkuster · April 14, 2025, 2:24pm

Hello,
is it possible to use Tool Calling using the offline inference paradigm, e.g. using the LLM class?
I noticed guided decoding supports JSON schema but in case of multiple tools, each obviously has a separate JSON schema.

Alternatively, is it possible to start and stop the online inference vllm server from within python/jupyter notebook?

Thanks and best regards

DystopianJunkyardKid · April 14, 2025, 3:40pm

Personally, I separate my ‘server’ part and ‘client’ (even on the same machine via calling localhost).
I write entire ‘client’ by bash.

There are at least (without considering specific guide written in system prompt) two parts you should aware for tool calling:

‘tools’ definitions delivered to LLM, let LLM know what/how language (tools) they can ‘speak’ of.
Your client part for detecting if LLM ‘speak’ in that way (might be many tools at once), looping through tools(may or may not need a result return to LLM).

It is pretty complex though. You’d better ask some big and recent enough LLM for this (it really know what schema at least itself trained of).

Topic		Replies	Views
Offline multi-node inference General	11	279	May 19, 2025
Multiple tools with Mistral Large 2411 Tool Calling	4	147	March 26, 2025
DeepSeek-V3 tool_choice="auto", not working but tool_choice="required" is working Tool Calling	3	139	August 25, 2025
How to do inference of BGE-m3 embedding with vllm General	1	14	August 27, 2025
Error trying to handle streaming tool call Tool Calling	3	105	July 17, 2025

Tool calling using Offline Inference?

Related topics