There is currently no global/default setting for truncate_prompt_tokens that applies to all requests in vLLM; you must set it per request. However, you can pass truncate_prompt_tokens (e.g., 16000) as a parameter in each embedding or pooling request to ensure prompts are truncated instead of failing when too long. Setting a global default is discussed but not implemented; the recommended approach is to set truncate_prompt_tokens in each request or via PoolerConfig if/when supported server-side in the future. See discussion in PR 24235 and issue 13489.
Would you like more detail or code examples for setting this parameter per request?