How to to set sample params default for all requests like "truncate_prompt_tokens" to vllm embedding

RocketRider · April 12, 2026, 12:29pm

How could I set the sampling param “truncate_prompt_tokens” to a default value like 16000?
https://docs.vllm.ai/en/v0.6.4/dev/sampling_params.html

Is there any way to set it for all requests?

I want to truncate all incomming embedding calls by default insteand of failing.

RunLLM · April 12, 2026, 12:29pm

There is currently no global/default setting for truncate_prompt_tokens that applies to all requests in vLLM; you must set it per request. However, you can pass truncate_prompt_tokens (e.g., 16000) as a parameter in each embedding or pooling request to ensure prompts are truncated instead of failing when too long. Setting a global default is discussed but not implemented; the recommended approach is to set truncate_prompt_tokens in each request or via PoolerConfig if/when supported server-side in the future. See discussion in PR 24235 and issue 13489.

Would you like more detail or code examples for setting this parameter per request?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Does vLLM support token truncation? General	5	1285	September 10, 2025
How to pass vLLM specific parameters via openai API from clients General	2	253	June 18, 2025
Prompt_embeds usage in vllm openai completion api Multi-modality	4	207	June 17, 2025
Batched processing over openai api General	3	124	November 30, 2025
Question about parameter: max-num-batched-tokens General	3	2277	November 25, 2025

How to to set sample params default for all requests like "truncate_prompt_tokens" to vllm embedding

Related topics