Does vLLM support token truncation?

RunLLM · September 10, 2025, 8:25am

Yes, vLLM supports token truncation, but the support depends on the API and context. For the OpenAI-compatible completion and embedding APIs, you can use the truncate_prompt_tokens parameter to truncate the prompt to the last k tokens (left truncation). For example, in the OpenAI embedding API, you can specify truncate_prompt_tokens=10 to use only the last 10 tokens of the prompt. However, this parameter is not universally supported in all vLLM offline or engine APIs, and for some cases (like the offline LLM engine or AsyncLLM), you may need to manually truncate the prompt before passing it to vLLM, as truncation is not implemented there yet. Right truncation (keeping the first k tokens) is not natively supported—only left truncation is available via this parameter. See the code example below for OpenAI embedding API usage:

# Example for OpenAI embedding API
response = await client.post(
    path="embeddings",
    cast_to=object,
    body={
        "model": "your-model-name",
        "input": "your input text",
        "truncate_prompt_tokens": 10
    }
)

For offline usage, you must tokenize and truncate the prompt yourself before passing it to the engine, as truncate_prompt_tokens is not available in the offline API. If you try to use it in unsupported contexts, vLLM will raise an error or ignore the parameter. For more details and limitations, see the linked sources.

Would you like more detail or examples for a specific API or use case?

Sources:

Topic		Replies	Views
How to to set sample params default for all requests like "truncate_prompt_tokens" to vllm embedding General	1	57	April 12, 2026
Output truncated without reason General	2	771	July 30, 2025
How to custom end token in vllm serve cli? General	4	193	December 9, 2025
Prompt_embeds usage in vllm openai completion api Multi-modality	4	207	June 17, 2025
Tell me about the current status of the tokenize endpoint in vllm General	4	375	December 8, 2025

Does vLLM support token truncation?

Related topics