Which software components vLLM inference needs

Can you list all the needed software components/packages what vLLM inferencing needs with Nvidia RTX PRO card when using clean Ubuntu 24.04 or Debian 13 and the vllm docker images.

For example:
Git, Python, nvidia drivers (which version), what else?

For vLLM inference with an NVIDIA RTX PRO card on a clean Ubuntu 24.04 or Debian 13 system using the official vLLM Docker images, you mainly need:

  • NVIDIA GPU Driver: Install the latest NVIDIA driver that supports your GPU and CUDA 12.8+ (e.g., driver version 570.x or newer for Blackwell/RTX PRO 6000/5090) [source].
  • NVIDIA Container Toolkit: Required for Docker to access the GPU [source].
  • Docker: Any recent version supporting the NVIDIA runtime [source].

The vLLM Docker image itself contains all other dependencies (Python, vLLM, CUDA libraries, PyTorch, etc.), so you do not need to install Git, Python, or CUDA toolkit on the host. Just ensure the NVIDIA driver and container toolkit are installed and configured. Would you like a step-by-step installation guide?

Sources:

Hint: Mention @RunLLM in the post for followups.

yes give me step by step instructions

and make ansible playbook out of it

Step-by-step instructions to prepare a clean Ubuntu 24.04 or Debian 13 system for vLLM inference with an NVIDIA RTX PRO card using Docker:

  1. Install NVIDIA GPU Driver (latest, e.g., 570.x+ for Blackwell/RTX PRO).
  2. Install Docker (latest stable).
  3. Install NVIDIA Container Toolkit for Docker GPU access.
  4. (Optional) Add your user to the docker group for non-root usage.
  5. Pull and run the vLLM Docker image.

Below is an Ansible playbook that automates these steps (excluding a reboot, which is required after driver install):

---
- name: Setup vLLM inference environment on Ubuntu/Debian
  hosts: all
  become: yes

  tasks:
    - name: Install required packages
      apt:
        name:
          - curl
          - ca-certificates
          - gnupg
          - lsb-release
        state: present
        update_cache: yes

    - name: Add NVIDIA package repositories
      shell: |
        distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
        curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
        curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
      args:
        executable: /bin/bash

    - name: Install NVIDIA driver (latest)
      apt:
        name: nvidia-driver-570
        state: present
        update_cache: yes

    - name: Install Docker
      apt:
        name:
          - docker.io
        state: present
        update_cache: yes

    - name: Install NVIDIA Container Toolkit
      apt:
        name: nvidia-container-toolkit
        state: present
        update_cache: yes

    - name: Configure Docker to use NVIDIA runtime
      shell: |
        nvidia-ctk runtime configure --runtime=docker
      args:
        creates: /etc/docker/daemon.json

    - name: Restart Docker
      service:
        name: docker
        state: restarted

    - name: Add user to docker group (optional)
      user:
        name: "{{ ansible_user_id }}"
        groups: docker
        append: yes

    - name: Pull vLLM Docker image
      docker_image:
        name: vllm/vllm-openai
        tag: latest
        source: pull

    - name: Print post-install instructions
      debug:
        msg: |
          Reboot the system to load the NVIDIA driver.
          After reboot, run:
          docker run --runtime=nvidia --gpus all -p 8000:8000 vllm/vllm-openai:latest --model <your-model>

Would you like a more detailed breakdown or customizations for your environment?

Sources:

Hint: Mention @RunLLM in the post for followups.