I have an RTX50xx GPU running on a Ubuntu box and I am pretty new to the LLM stack, top to bottom. Due to so many new-to-me pieces in play, I’ve been using Docker to get most of the pieces in order such that vllm can serve on my computer.
Two or maybe three times now in the last ~few months, I’ve gone to do “vllm serve” in my container and all-of-a-sudden it doesn’t work anymore. Seems to be that automatic system updates will update the Linux, but the updates apparently deprecate the container’s ability to run on top of the updated OS.
The first time this happened, I plodded through apt automatic update logs and “reverse engineered” my log text to find what all I might downgrade in attempts at making vllm serve once again inside my existing, “it worked yesterday” docker container. Plodding through update logs and issuing undo’s seems like a massively hacky way to go. There’s got to be a better way (or many better ways, i’d bet)
Second time this happened, I found an updated container that worked with updated drivers from nvidia. This one felt kind of a normal, required maintenance update that needed to be done. Nevertheless, waking up to vllm not serving in the container (now for a 2nd time) got me thinking the whole vllm stack + container was somewhat of a house of cards, subject to fall down at any time with no warning (except i’m sure there’s warnings and i just don’t where to look and might not even know which warnings would mean "your containerized vllm is fixing to break when items X Y and Z update on MM/DD which is coming up in 4 days). This feels like reviewing auto-update logs, except in advance, so “previewing prelogs” lol
I know about version lock/freeze with apt, but oof, what all would i lock to prevent containerize vllm from FTS. I have no clue but I bet it would be dozens of locks, and upon locking at least a few of them, the locked version would create headaches and even vulnerability elsewhere.
Considering the above too-long preface, what strategies are there for preventing changes that prevent vllm from working when it worked “yesterday” or 3 days ago if it’s been 3 days since i last tried, so let’s say "when it worked last time and I (ME, the human) didn’t change anything.
Do we have utilities available that can compare & contrast OS vs container vs GPU (or any other ~critical hardware that’s very easy to change)?
Disclosure: During the last vLLM office hours, I am pretty sure I heard, to paraphrase, that everything but torch is now in normal release versions, so if i do a vllm build with the “using existing torch” flag, I could probably dump Docker (for now, till the next new GPU is released ready for saving princesses but not ready for us) and wrangle vllm with the OS instead of container. And that is likely what I’m going to do, but the next latest, greatest must-have GPU can’t be far off so I figured I should ask "how do we do this, keeping containers in line with the OS and its autoupdates, when I kind of thought containers were supposed to isolate us from having to do that, in general, but i guess bc LLM processing is so hardware-bound that containers either pass that work through to the OS, or, if not a pass-thru scenario, do at least require exactly the versions of things (such as NVML today as the first presenting issue for me)
If the most relevant strategy here is “turn off your OS auto-updates,” I’m game for that, but if that’s the case, what’s the strategy for reviewing and applying updates manually without breaking the container’s ability to operate on the OS as it’s updated?
Thank you for all your guidance RunBot