Tool calling using Offline Inference?

Hello,
is it possible to use Tool Calling using the offline inference paradigm, e.g. using the LLM class?
I noticed guided decoding supports JSON schema but in case of multiple tools, each obviously has a separate JSON schema.

Alternatively, is it possible to start and stop the online inference vllm server from within python/jupyter notebook?

Thanks and best regards

1 Like

Personally, I separate my ‘server’ part and ‘client’ (even on the same machine via calling localhost).
I write entire ‘client’ by bash.

There are at least (without considering specific guide written in system prompt) two parts you should aware for tool calling:

  1. ‘tools’ definitions delivered to LLM, let LLM know what/how language (tools) they can ‘speak’ of.
  2. Your client part for detecting if LLM ‘speak’ in that way (might be many tools at once), looping through tools(may or may not need a result return to LLM).

It is pretty complex though. You’d better ask some big and recent enough LLM for this (it really know what schema at least itself trained of).