Custom modality

Mateusz-Smolarczyk-A · November 14, 2025, 8:41am

Hello,

I’m new to vLLM. I’m currently working on extending a multimodal model within vLLM, and I would like to add support for a custom modality (for example ECG (electrocardiogram) signals).

The documentation describes how to integrate image, audio, and video inputs into vLLM (https://docs.vllm.ai/en/latest/contributing/model/multimodal/
My question is:

Is it possible to introduce an entirely new modality that is not part of the existing set (i.e., not image/video/audio), provided that I supply a compatible encoder and projection layer?

In other words, can I register and handle a new input type (e.g., ECG tensors), implement custom parsing/validation, and pass the resulting embeddings into the multimodal pipeline in the same way that vision/audio/video modalities are currently integrated?

If so, are there recommended patterns or extension points in vLLM for defining custom modalities beyond the built-in ones?

Any guidance or examples would be greatly appreciated.

Thank you!

RunLLM · November 14, 2025, 8:41am

Yes, you can introduce a new custom modality (such as ECG signals) in vLLM by implementing a compatible encoder, projection layer, and the necessary processing logic. vLLM’s multimodal support is designed to be extensible: you can define new modalities by updating your model to accept additional keyword arguments in forward(), implementing a custom get_multimodal_embeddings() method, and registering your modality’s processor. The returned embeddings must follow the expected tensor shapes, and you can merge them with text embeddings as with built-in modalities. See the Multi-Modal Support guide for the required steps and extension points.

Specifically, you should implement a custom processor for your modality, register it in the multimodal registry, and ensure your model class inherits from SupportsMultiModal. This allows vLLM to parse, validate, and process your new input type in the same way as image/audio/video. There are no restrictions on the modality name or type, as long as you provide the necessary logic for encoding and merging embeddings. For more details, refer to the official documentation.

Would you like a step-by-step example or code snippet for registering a custom modality?

Sources:

https://docs.vllm.ai/en/latest/contributing/model/multimodal/

Hint: Mention @RunLLM in the post for followups.

DarkLight1337 · November 14, 2025, 10:21am

The bot is correct. Though, you’ll have to extend the OpenAI API server if you want to input other modalities via online API.

Mateusz-Smolarczyk-A · November 14, 2025, 1:41pm

Thank you! (Right now I’m focusing on offline inference.)

Topic		Replies	Views
How does vllm process multimodal embedding requests General	8	92	May 7, 2026
Compressed Multimodal embeddings inputs Multi-modality	1	53	March 18, 2026
Is there a plan to support cross-attention'd multimodal LLM? General	3	134	June 1, 2025
How to expose v1/audio/transcriptions router for custom models General	44	303	March 20, 2026
How do I precompute multimodal embeddings? Multi-modality	5	302	February 2, 2026

Custom modality

Related topics