VUA - library code for LLM inference engines for external storage of KV caches

jonasrosland · May 13, 2025, 8:06pm

Hi all!

I wanted to share that VAST Data’s VUA project is now open source, and hopefully it could be of use within the community here. VUA integrates with popular AI inference workloads and expands cache space to a third tier of persistent, shared NVMe memory, providing infinite context scalability. You can easily use VUA as a plugin for vLLM 0.8.5 and up, check out our GitHub repo.
We also wrote a lengthy blog post about it.

If you’re interested, the team behind it would love to share and show it off during a community meeting or similar.

RunLLM · May 13, 2025, 8:07pm

Thank you for sharing! However, based on the provided data sources, there is no official documentation or mention of a “VUA” plugin or integration for vLLM, nor is there any reference to VAST Data’s VUA project in the vLLM documentation or GitHub repositories. Therefore, I cannot confirm or provide details about VUA’s compatibility or usage with vLLM from the official sources.

If you would like to contribute or propose an integration, the vLLM community encourages contributions and discussions via GitHub issues, pull requests, or community meetings, as described in the contributing guide. Would you like more information on how to propose or document third-party integrations for vLLM?

Sources:

contributing guide

Topic		Replies	Views
Support for (sparse) key value caching KV-Cache	16	50	May 3, 2025
Welcome to vLLM Forums! :wave: General	1	477	March 24, 2025
Multimodal inference guideline? General	13	58	May 27, 2025
Is there any roadmap to support prefix caching on dram and disk? Disaggregated Prefilling	0	32	April 8, 2025
The new V1 way to ~--cpu-offload-gb V1 Feedback	5	360	April 13, 2025

VUA - library code for LLM inference engines for external storage of KV caches

Related topics