How to serve a transformed Pytorch model

v1234 · September 2, 2025, 2:55pm

Hi everyone,

I have a Python script that takes an instantiated Pytorch model and:

Add some layers to it (replace some layers by custom layers with standard ops)
Change the weights + biases of each layer

I made the assumption that “key” layers like attention will not be modified, only new layers will be added between.

I would like to integrate a such code with VLLM.
What are your recommendations in terms of integration? I know that VLLM is not handling already instantiated models.

Thank you!

RunLLM · September 2, 2025, 2:56pm

vLLM does not support loading already-instantiated PyTorch models directly; it requires loading models from disk or the Hugging Face Hub using a model path or identifier, not a Python object. To integrate your modified model with vLLM, you must save your model (with custom layers and updated weights) in a Hugging Face-compatible format (including config.json and model weights), then load it via vLLM using the model directory path. If your architecture is not natively supported, you must adapt your model to vLLM’s requirements and register it as a custom model, following the steps in the vLLM documentation for adding and registering new models, including implementing required interfaces and weight loading logic. See this issue and the basic model integration guide.

If your modifications are minor and the model remains compatible with Hugging Face Transformers, you can use the Transformers backend in vLLM by ensuring your model directory has the correct structure and config. For more complex changes (e.g., new attention mechanisms), you must implement the model in vLLM’s codebase and register it. See supported models and custom model integration for details. Would you like a step-by-step breakdown of the integration process?

Sources:

Topic		Replies	Views
How to inference or deploy with my custom model General	2	247	February 4, 2026
Using vLLM on a HF model architecture modified locally Model Support	1	235	July 7, 2025
Customized Model Generation Verification General	1	87	July 14, 2025
Is there a newly example to show how to add a new LLM into vLLM? General	3	82	June 10, 2025
Support for Deploying 4-bit Fine-Tuned Model with LoRA on vLLM Quantization	13	1031	July 30, 2025

How to serve a transformed Pytorch model

Related topics