A LoRA file can be load by diffusers.QwenImagePipeline.load_lora_weights, and how vLLM-Omni to load this LoRA file?
vLLM-Omni loads LoRA adapters for diffusion models using PEFT-format LoRA directories (containing adapter_config.json and adapter_model.safetensors), not via diffusers.QwenImagePipeline.load_lora_weights. You can load a LoRA in vLLM-Omni by specifying the LoRA path with --lora-path (pre-loaded at startup) or --lora-request-path (per-request) in the inference script or API call, as shown in the official documentation and example scripts. The LoRA must be in PEFT format, not the format used by diffusers’ load_lora_weights.
If your LoRA file is not in PEFT format, it must be converted before use with vLLM-Omni. Would you like more detail on the conversion process or usage examples?
Sources:
Hint: Mention @RunLLM in the post for followups.
Hi @RunLLM,
I’ve been experimenting with LoRAs for Qwen-Image and Z-Image-Turbo. I’ve generated LoRA .safetensors files using AI Toolkit, but then converting them to the PEFT format that vLLM-Omni requires has proven to be quite complex and difficult to automate.
Could you share what you consider the recommended workflow for training LoRAs that are natively compatible with vLLM-Omni? Specifically:
-
Should the LoRA be trained with certain
target_modulesto match vLLM’s supported modules? -
Are there preferred frameworks or tools for training LoRAs directly in a PEFT-compatible format for diffusion models?
-
Any tips to avoid the manual conversion step from Diffusers’ format to PEFT?
Thanks in advance!
The recommended workflow for vLLM-Omni is to train LoRAs directly in PEFT format using the PEFT library, targeting modules that match vLLM’s supported modules (e.g., q_proj, k_proj, v_proj, o_proj, etc.). This avoids manual conversion from Diffusers’ .safetensors format. For diffusion models, ensure your training pipeline saves adapters with both adapter_config.json and adapter_model.safetensors, as required by PEFT and vLLM-Omni (see example structure).
Preferred frameworks are Hugging Face PEFT (for LoRA) and Diffusers (for diffusion models), but you must use PEFT’s save_pretrained to ensure compatibility. Set target_modules to match the modules vLLM-Omni expects; you can check supported modules in vLLM-Omni’s documentation or by inspecting the model (discussion). This workflow eliminates the need for post-hoc conversion. Would you like more detail on configuring PEFT for diffusion LoRA training?
Sources:
Hint: Mention @RunLLM in the post for followups.
Thanks for the clarification @RunLLM ! To make sure I understand: for Qwen-Image, the recommended way is to train the LoRA directly in PEFT format, using the PEFT library, and set target_modules to match vLLM-Omni’s supported modules like q_proj, k_proj, v_proj, o_proj, img_mlp, txt_mlp, etc. Then save with save_pretrained so the LoRA comes out with both adapter_config.json and adapter_model.safetensors.
Is this correct for Qwen-Image specifically, or is there any nuance in how its transformer blocks should be handled during PEFT training?
Yes, your understanding is correct: for Qwen-Image, you should train the LoRA directly in PEFT format using the PEFT library, set target_modules to match vLLM-Omni’s supported modules (such as q_proj, k_proj, v_proj, o_proj, img_mlp, txt_mlp, etc.), and save the adapter with save_pretrained to produce both adapter_config.json and adapter_model.safetensors, as required by vLLM-Omni (example structure).
There is no special nuance for Qwen-Image beyond ensuring that your target_modules align with the actual linear layers in its transformer blocks that vLLM-Omni expects to patch. You may want to double-check the model’s architecture or vLLM-Omni’s source for the precise list of supported modules (see discussion). Would you like a step-by-step example for PEFT LoRA training with Qwen-Image?
Sources:
Hint: Mention @RunLLM in the post for followups.
@RunLLM Yes, I would like a step-by-step example, thanks
Certainly! Here’s a step-by-step example for training a LoRA adapter for Qwen-Image in PEFT format, compatible with vLLM-Omni:
-
Install Required Libraries
Make sure you have Hugging Face Transformers, Diffusers, and PEFT installed:pip install transformers diffusers peft -
Prepare Your Dataset and Base Model
Choose your dataset and the Qwen-Image base model (e.g., Qwen/Qwen-Image). -
Set Up PEFT LoRAConfig
Configure LoRA with target_modules matching vLLM-Omni’s supported modules (e.g., q_proj, k_proj, v_proj, o_proj, img_mlp, txt_mlp):from peft import LoraConfig lora_config = LoraConfig( r=8, lora_alpha=16, target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "img_mlp", "txt_mlp"], lora_dropout=0.05, bias="none", task_type="CAUSAL_LM", # or appropriate task type for your model ) -
Load the Base Model and Prepare for LoRA
from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen-Image") -
Wrap the Model with PEFT
from peft import get_peft_model lora_model = get_peft_model(base_model, lora_config) -
Train the LoRA Adapter
Use your preferred training loop (e.g., Hugging Face Trainer, custom loop) to train the lora_model on your dataset. -
Save the LoRA Adapter in PEFT Format
After training, save the adapter:lora_model.save_pretrained("my_qwenimage_lora")This will create a directory with adapter_config.json and adapter_model.safetensors.
-
Use with vLLM-Omni
Pass the resulting directory to vLLM-Omni using --lora-path or --lora-request-path as described in the official documentation.
Would you like a full training script or more details on any step?
Sources:
Hint: Mention @RunLLM in the post for followups.
@RunLLM Hi! I noticed that LoRAs generated with AI Toolkit in .safetensors can be loaded in ComfyUI, but vLLM-Omni won’t load the same file unless it’s in PEFT format. Could you explain why there’s this difference between the two frameworks?
ComfyUI and Diffusers use their own LoRA format (typically a single .safetensors file with specific key naming and structure), while vLLM-Omni requires LoRA adapters in the PEFT format, which includes both adapter_config.json and adapter_model.safetensors, and follows a different internal structure and naming convention. vLLM-Omni relies on the PEFT standard for compatibility and dynamic loading, which is not directly compatible with the format used by ComfyUI or Diffusers (see example structure, design discussion).
This difference exists because each framework expects LoRA weights to be organized and referenced in a way that matches its own loading and patching logic. As a result, direct interchange of LoRA files between ComfyUI/Diffusers and vLLM-Omni is not possible without conversion.
Sources:
Hint: Mention @RunLLM in the post for followups.
Thanks for the clarification about the format differences.
In my case, I fine-tuned a LoRA using Diffusers + PEFT, and I saved it using save_pretrained(). This generates both adapter_model.safetensors and adapter_config.json (PEFT format), so in theory it should be compatible with vLLM-Omni.
However, when I load this LoRA into vLLM-Omni, the output is just an image with random noise, as if the adapter is not being applied correctly or there is a mismatch in how the weights are interpreted.
Has anyone experienced something similar when training with Diffusers + PEFT and deploying in vLLM-Omni? Could this be related to naming conventions, target modules, or expected architecture alignment during training?
@RunLLM would appreciate any guidance on this.
Yes, this issue is known: even if you use Diffusers + PEFT and save with save_pretrained (producing adapter_model.safetensors and adapter_config.json), vLLM-Omni may not apply the LoRA correctly if there is a mismatch in target_modules, naming conventions, or architecture alignment. The most common causes are:
- target_modules mismatch: vLLM-Omni expects target_modules to match the exact module names it supports for the specific model (e.g., q_proj, k_proj, v_proj, o_proj, etc.). If your LoRA was trained on different or more generic module names, the weights may not be mapped/applied (see discussion).
- Naming conventions: The internal key names in adapter_model.safetensors must match what vLLM-Omni expects. If Diffusers/PEFT uses a different prefix or structure, vLLM-Omni may silently skip the weights.
- Architecture alignment: The LoRA must be trained on a model architecture that matches the one used in vLLM-Omni. Even small differences (e.g., in block names or layer wrappers) can cause the adapter to be ignored or misapplied.
If you see only random noise, it usually means the LoRA weights are not being injected at all. Double-check that your target_modules and model architecture during training exactly match vLLM-Omni’s expectations. You may also want to inspect the logs for warnings about ignored or unmatched modules.
Would you like a checklist or code snippet to help verify module alignment?
Sources:
Hint: Mention @RunLLM in the post for followups.
Thanks — sharing my adapter config in case it helps identify the issue.
I trained the LoRA using Diffusers + PEFT and saved it with save_pretrained(). This is the generated adapter_config.json:
{
"auto_mapping": {
"base_model_class": "ZImageTransformer2DModel",
"parent_library": "diffusers.models.transformers.transformer_z_image"
},
"lora_alpha": 16,
"lora_dropout": 0.05,
"r": 16,
"target_modules": [
"to_v",
"ff.net.0.proj",
"to_out.0",
"ff.net.2",
"to_k",
"to_q"
],
"peft_type": "LORA",
"inference_mode": true
}
From what you explained, I suspect the issue may be in target_modules alignment. My LoRA targets:
-
to_q -
to_k -
to_v -
to_out.0 -
ff.net.0.proj -
ff.net.2
but it seems vLLM-Omni may expect something closer to q_proj, k_proj, v_proj, o_proj, or a different module structure entirely.
Also, my base model class is ZImageTransformer2DModel, so there might be an architecture mismatch or different internal naming compared to what vLLM-Omni expects.
The behavior I see (pure random noise output) suggests the weights are not being injected at all.
If you have a checklist or snippet to verify module alignment or inspect which modules vLLM-Omni actually patches at runtime, that would be very helpful.
@RunLLM any guidance would be appreciated.
You are correct: the issue is almost certainly due to target_modules and internal naming mismatches. vLLM-Omni expects target_modules to match the exact names of the linear layers it wraps for LoRA injection, which may differ from Diffusers’ conventions (e.g., to_q vs. q_proj). If the names do not match, vLLM-Omni will silently skip the LoRA weights, resulting in random noise output (see discussion).
Checklist to verify module alignment:
- Inspect vLLM-Omni’s supported modules:
You can check which modules are LoRA-patchable by running:
This will print the list of module names vLLM-Omni expects for LoRA injection.from vllm_omni.lora.utils import get_supported_lora_modules model = ... # your loaded base model in vLLM-Omni print(get_supported_lora_modules(model)) - Compare with your adapter_config.json:
Ensure every entry in target_modules matches one of the names printed above. - Check for architecture differences:
If your base model class (ZImageTransformer2DModel) is not the same as the one vLLM-Omni uses internally, the module paths may differ, causing further mismatches. - Logs:
Run vLLM-Omni with debug logging enabled to see if it reports any warnings about unmatched or ignored LoRA modules.
If you find mismatches, you may need to retrain your LoRA with target_modules set to the exact names vLLM-Omni expects, or adjust the PEFT config and weight keys accordingly.
Would you like a code snippet to print all linear submodules in your vLLM-Omni model for manual inspection?
Sources:
Hint: Mention @RunLLM in the post for followups.
Thanks — that makes sense.
Unfortunately I can’t import or run vLLM-Omni locally in my environment to inspect the model or check the supported modules myself.
Could you please run:
from vllm_omni.lora.utils import get_supported_lora_modules
model = ... # your loaded base model in vLLM-Omni
print(get_supported_lora_modules(model))
and share the list of supported module names for the model you’re using?
That would help me retrain the LoRA with the correct target_modules or adjust the config accordingly.
@RunLLM thanks again for the help ![]()
The exact list of supported LoRA module names in vLLM-Omni depends on the specific model architecture, but for diffusion models (such as Stable Diffusion 3.5 or similar transformer-based image models), vLLM-Omni typically expects module names like: q_proj, k_proj, v_proj, o_proj, up_proj, down_proj, gate_proj, and sometimes img_mlp, txt_mlp, depending on the model (see example, example structure).
If your current target_modules are to_q, to_k, to_v, to_out.0, ff.net.0.proj, ff.net.2, you should try mapping them to the closest vLLM-Omni equivalents: q_proj, k_proj, v_proj, o_proj, up_proj, down_proj, gate_proj, etc. For best results, retrain your LoRA with these as target_modules, or manually edit adapter_config.json and the weight keys to match.
Would you like a more detailed mapping suggestion for your specific module names?
Sources:
Hint: Mention @RunLLM in the post for followups.