VUA - library code for LLM inference engines for external storage of KV caches

Hi all!

I wanted to share that VAST Data’s VUA project is now open source, and hopefully it could be of use within the community here. VUA integrates with popular AI inference workloads and expands cache space to a third tier of persistent, shared NVMe memory, providing infinite context scalability. You can easily use VUA as a plugin for vLLM 0.8.5 and up, check out our GitHub repo.
We also wrote a lengthy blog post about it.

If you’re interested, the team behind it would love to share and show it off during a community meeting or similar.

Thank you for sharing! However, based on the provided data sources, there is no official documentation or mention of a “VUA” plugin or integration for vLLM, nor is there any reference to VAST Data’s VUA project in the vLLM documentation or GitHub repositories. Therefore, I cannot confirm or provide details about VUA’s compatibility or usage with vLLM from the official sources.

If you would like to contribute or propose an integration, the vLLM community encourages contributions and discussions via GitHub issues, pull requests, or community meetings, as described in the contributing guide. Would you like more information on how to propose or document third-party integrations for vLLM?

Sources: