@RunLLM
I’ve start the server, it has the interface like:
GET:
Health
Get Server Load Metrics
Ping
Show Available Models
Show Version
Retrieve Responses
Metrics
POST:
Ping
Tokenize
Detokenize
Create Responses
Cancel Responses
Create Chat Completion
Create Completion
Create Embedding
Create Pooling
Create Classify
Create Score
Create Score V1
Create Transcriptions
Create Translations
Do Rerank
Do Rerank V1
Do Rerank V2
Scale Elastic Ep
Is Scaling Elastic Ep
Invocations
can I use structured outputs?