Registering with Transformers Auto Classes for out-of-tree models

agharwal · July 29, 2025, 10:26pm

Question

How to register custom out-of-tree models with Transformers Auto Classes for running AsyncLLMEngine in the OPENAI_API_SERVER usage context, without modifying vLLM source code?

Context

I’m currently doing so by calling my registration method from within vllm/model_executor/models/registry.py, which works fine, but I’d like to avoid modifying vLLM source code to speed-up vLLM version bumps in deployments.

As I understand it, the V1 Engine spawns Workers but it seems like the Transformers registration, if done from the top-level app, does not carry over to the Workers that error out while interpreting HF configs.

RunLLM · July 29, 2025, 10:27pm

To register custom out-of-tree models (OOT) with Transformers Auto Classes for use with AsyncLLMEngine and the OpenAI API server—without modifying vLLM source code—you should use vLLM’s plugin system. This ensures your registration code is executed in every worker process, which is necessary for distributed and multiprocessing setups, including V1 Engine and AsyncLLMEngine. Place your model registration logic in a plugin entrypoint, and install your plugin package so vLLM can discover and load it automatically at startup. This approach is officially recommended and avoids the pitfalls of top-level registration, which does not propagate to worker processes and leads to errors when loading custom models in distributed contexts. See vLLM’s Plugin System documentation and Model Registration guide for details.

Example plugin registration code (in your plugin’s entrypoint, e.g., register()):

def register():
    from vllm import ModelRegistry
    from transformers import AutoConfig
    from your_code import YourModelForCausalLM, YourConfigClass

    AutoConfig.register("your_model_type", YourConfigClass, exist_ok=True)
    ModelRegistry.register_model("YourModelForCausalLM", YourModelForCausalLM)

Then, set up your package’s setup.py with an entry point under vllm.general_plugins, and install it in your environment. This ensures the registration is executed in all vLLM processes, including workers spawned by the API server or AsyncLLMEngine, without modifying vLLM source code. Would you like a step-by-step example of the plugin packaging and installation process?

Sources:

Topic		Replies	Views
Registering a Standalone Preprocessing Module for Model Acceleration General	3	23	July 23, 2025
Using vLLM on a HF model architecture modified locally Model Support	1	45	July 7, 2025
Can the AsyncEngine be compatible with the backend of external_launcher? RL Integration	1	31	June 18, 2025
Is there a newly example to show how to add a new LLM into vLLM? General	3	20	June 10, 2025
Is is possible to initialize an AsyncLLMEngine inside the LLM object? verl	4	222	April 12, 2025

Registering with Transformers Auto Classes for out-of-tree models

Question

Context

Related topics