Priority in batch api

hustxiayang · February 12, 2026, 7:19pm

@RunLLM how to set the priority for batch jobs in vLLM

RunLLM · February 12, 2026, 7:19pm

To set the priority for batch jobs in vLLM, use the priority parameter in your request. For the OpenAI-compatible API, include priority in the extra_body field of your request JSON. Lower values mean higher priority (i.e., processed earlier). You must also start the vLLM server with --scheduling-policy priority to enable priority scheduling. For example:

completion = client.chat.completions.create(
    model="your-model",
    messages=[{"role": "user", "content": "Prompt"}],
    extra_body={"priority": 10}
)

This will assign a priority of 10 to the request. Note: Priority scheduling is only effective if enabled, and a lower priority value means higher scheduling precedence. For more details, see the discussion and usage in PR #5958 and related issue comments.

Would you like more detail or code examples for other interfaces?

Sources:

PR #5958: Adding Priority Scheduling

Hint: Mention @RunLLM in the post for followups.

hustxiayang · February 12, 2026, 7:20pm

@RunLLM I mean how to set the value? Is there a guide on this?

RunLLM · February 12, 2026, 7:21pm

There is no official guide specifying how to choose or set the exact value for the priority parameter; it is user-defined. The only rule is that lower values are treated as higher priority (processed earlier), and ties are broken by arrival time. You can assign any integer value based on your application’s needs (e.g., 0 for highest, 10 for lower, etc.)—the scale and meaning are up to you. See PR #5958 and API doc.

Would you like more detail or best practices for designing a priority scheme?

Sources:

PR #5958: Adding Priority Scheduling

Hint: Mention @RunLLM in the post for followups.

hustxiayang · February 12, 2026, 7:23pm

@RunLLM yes, what is the default values, and how to make sure that my batch job does not affect interactive requests

RunLLM · February 12, 2026, 7:23pm

By default, the priority value is 0 if not specified. Lower values mean higher priority (e.g., -1 is higher priority than 0, and 10 is lower priority). To ensure your batch jobs do not affect interactive requests, assign higher (less urgent) priority values to batch jobs (e.g., priority=10 or higher), and use lower (more urgent) values (e.g., priority=0 or priority=-1) for interactive requests. The scheduler will always process lower-value (higher-priority) requests first when using --scheduling-policy priority (docs, PR #5958).

For best practices:

Define a clear mapping (e.g., priority=0 for interactive, priority=10 for batch, priority=20 for background).
Always start vLLM with --scheduling-policy priority.
Optionally, use a Service Level Agreement (SLA) tiering approach (see SLA-Tiered Scheduling RFC) for more granular control, though this is still under discussion.

Would you like a step-by-step example or more on SLA-tiered scheduling?

Sources:

Hint: Mention @RunLLM in the post for followups.

hustxiayang · February 12, 2026, 7:25pm

what is the --scheduling-policy priority? @RunLLM

RunLLM · February 12, 2026, 7:25pm

The --scheduling-policy priority flag tells vLLM to schedule requests based on their assigned priority value, with lower values being processed first. If two requests have the same priority, the one that arrived earlier is processed first. The default policy is fcfs (first-come-first-served), but setting --scheduling-policy priority enables priority-based scheduling (docs).

Would you like more detail on how to use this flag or how it affects scheduling?

Sources:

Hint: Mention @RunLLM in the post for followups.

Topic		Replies	Views
Batched processing over openai api General	3	65	November 30, 2025
A question about request handling General	5	543	August 21, 2025
Whats happens internally when I do llm.generate(..) General	3	159	September 3, 2025
Vllm bench serve + Bearer API key + HTTPS Benchmarking	1	350	August 7, 2025
VLLM V1 Scheduler: Inconsistent Request Scheduling Under Token Budget Limit General	25	245	December 17, 2025

Priority in batch api

Related topics