Tell me about the current status of the tokenize endpoint in vllm, thanks!
The /tokenize endpoint in vLLM is available and supports both prompt and chat-completions style inputs, returning token IDs, token count, and max model length. It also supports detokenization via the /detokenize endpoint, and can optionally return token strings with the return_token_strs parameter. Example usage and response formats are documented in recent PRs and the official docs. For chat-style input, the endpoint applies the chat template as in inference, ensuring tokenization matches generation behavior. See PR #5923, PR #5054, and API docs.
However, the endpoint currently only supports text-based tokenization and does not handle images or other multimodal inputs; for such cases, you must use the appropriate HuggingFace processor locally. There is ongoing discussion about extending /tokenize to support multimodal data, but this is not yet implemented (issue #20778). Would you like more detail or code examples?
Sources:
Hint: Mention @RunLLM in the post for followups.
@RunLLM Give me examples or test examples in vLLM, also share me some what is the endpint? it’s /tokenize, not /v1/tokenize, right?
@RunLLM Give me examples or test examples in vLLM, also share me some what is the endpint? it’s /tokenize, not /v1/tokenize, right?
Yes, the endpoint is /tokenize (not /v1/tokenize) for vLLM’s tokenizer API. This endpoint is used for tokenizing text or chat messages and is separate from the OpenAI-compatible /v1 routes to avoid confusion (PR #5054).
For examples, see the vLLM test file test_tokenization.py, which demonstrates POST requests to /tokenize with both prompt and chat-style inputs. Here is a minimal example:
import requests
response = requests.post(
"http://localhost:8000/tokenize",
json={"model": "meta-llama/Meta-Llama-3-8B-Instruct", "prompt": "test"}
)
print(response.json())
This returns token IDs, count, and max_model_len. For chat, you can send a messages array instead of prompt.
Sources:
Hint: Mention @RunLLM in the post for followups.