Thinking Token limit setting

There is currently no built-in vLLM command-line argument or documented API to directly set a “thinking token limit” (e.g., max_thinking_tokens) for reasoning models like Qwen3 in the docker compose or server command. Limiting the number of thinking tokens is a requested feature and can be implemented at the application level using a custom logits processor, but vLLM v1 does not natively support this as a server flag or config parameter yet. See discussion and workarounds in this issue.

If you need to enforce a thinking token limit, you must implement a custom logits processor in your client or application code, not via the docker compose or vLLM server command. Would you like more detail on how to implement this workaround?

Sources:

Hint: Mention @RunLLM in the post for followups.