Hello,
is it possible to use Tool Calling using the offline inference paradigm, e.g. using the LLM class?
I noticed guided decoding supports JSON schema but in case of multiple tools, each obviously has a separate JSON schema.
Alternatively, is it possible to start and stop the online inference vllm server from within python/jupyter notebook?
Thanks and best regards
1 Like
Personally, I separate my ‘server’ part and ‘client’ (even on the same machine via calling localhost).
I write entire ‘client’ by bash.
There are at least (without considering specific guide written in system prompt) two parts you should aware for tool calling:
- ‘tools’ definitions delivered to LLM, let LLM know what/how language (tools) they can ‘speak’ of.
- Your client part for detecting if LLM ‘speak’ in that way (might be many tools at once), looping through tools(may or may not need a result return to LLM).
It is pretty complex though. You’d better ask some big and recent enough LLM for this (it really know what schema at least itself trained of).