How to setup amd gpu as default in dual stack gpu?

tjtanaa · April 6, 2025, 3:12pm

@CarlosR759
Based on my understanding, with vLLM, the speculative decoding model will need to share the same GPU as the large model. There isn’t a configuration that allows speculative decoding model to be on one GPU, and the base model to be on another GPU.

Descriptively, the following is possible:

Base model TP2, draft model

GPU 0	GPU 1
Base Model (first half)	Base Model (second half)
Draft Model

The following is not possible yet

GPU 0	GPU 1
Draft Model	Base Model

Moreover, the draft model in speculative decoding currently needs to be run without tensor parallelism, meaning draft_tensor_parallel_size should be set to 1.

Topic		Replies	Views
Is the reason for my vllm 0.20.0 failing to start because of nixl? General	7	126	May 3, 2026
Make install easier General	11	620	July 24, 2025
No HIP GPUs are available for VeRL verl	4	741	April 4, 2025
Why latest rocm vllm is so bad? General	3	302	December 14, 2025
Updated rocm/vllm latest and not anymore working General	1	347	November 4, 2025

How to setup amd gpu as default in dual stack gpu?

Related topics