I have a fancy (to me) and new Nvdidia 5070ti that i put into a very old Dell XPS 8900 with a Core I7-6700. I maxed out my computer RAM to 64GB of DDR4-something and replaced slow old SATA drives with M.2 2280 SSSD.
I realize my mistake with the 5070ti, only 16GB VGPU. The 5090 with 32GB would have been the better option, on paper at least.
I figured GPU is critical for AI & VLLM, and my old CPU probably didn’t matter too much for me at this very early point of experimentation with AI LLMs and such. But then I compiled PyTorch, lol (trying to "compile with TORCH_USE_CUDA_DSA). 15+hours! Smaller compiles of other projects can run for hours. It feels like the 1980s all over again.
Looking at these new mini-boxes coming out ~soon we have at a basic level the Nvdia Spark + reseller variants, AMD Strix Halo 395+, and the Apple M4 MAX.
I am not sure if VLLM would even run on the Apple M4 MAX but it looks to be best overall box imo with the best bandwidth by far, but I don’t get the idea vLLM or anything non-Apple can make use of the Apple M4 MAX. But then I recalled Apple added Linux a long time ago, so maybe VLLM ARM releases might run on Apple? I have yet to see any mention of anybody’s Macbook, so I’m not sure about Apples being usable at all with non-Apple AI. Anybody know if that works?
Strix Halo 395+ and Nvidia Spark boxes seem to be relatively similar from what I can tell other than the price. (Can you even buy an Nvidia DIGITs box? Their price was a little better than Strix Halo 395+ but are about $1000 higher than now, having been renamed Sparks).
Is it even worth getting one of these? Or does it make more sense to use online services like Modal? I think $3000 to $4000 would go a long, long way on Modal or something. Plus I see a lot of folks talking about running on such and such systems that are beyond the financial reach of most folks even with high-end I.T. related income unless maybe they are I.T. DINKS who live in small houses and don’t blow their cash on fancy cars etc lol. So I figure people mean “online” when they talk about the very-very-expensive GPU they’re running on. Is online the optimal way to go?
OTOH using an online service bring its own bag of tricks. I tried to farm out my 15+ hour build of PyTorch onto Modal (as a Modal 1st-timer) but after working through innumerable mistakes I’d made and finally getting probably ~90% to where Modal would compile for me, I realized that maybe Modal would compile for me, but would it even match my install at home (without me learning and doing a ton more Modal Image setup), and then would the Modal-compiled output even make it back to me at home without a lot more figuring-stuff-out, with each iteration of figuring that out either having to wait an a full build on Modal which could be 10 to 20 minutes (from what i gather please nobody be mad at me if 1 or 2 minutes is the real number) or wait on me to figure out how to persist the built output on Modal and use it as my starting point for figuring out how to pull the built results back to my system (and maybe the “my system” part was just a bad idea to begin with)
During office hours I asked if VLLM would work on a Spark box by the time it works on GeForce 5000’s . The question was taken more as a timeline thing, but what I meant was more "if this GPU I own works ~seamlessly with VLLM, can I infer from 5070ti=good that Spark would also=good? Ie the architectures should theoretically be the ~same (at least the CUDA related parts) between the GeForce and the Spark with both of then being Blackwell. Or might I buy a Spark on release day assuming it’ll work bc the 50x0 GPU works, but then be struggling with the Spark box for months trying to con it into working before support for it is officially there?
Another way to put it is “On some Date X, whatever real date X may be, given Pytorch & VLLM release versions supporting my 5070ti and running normally without my having to get under the bonnet too much for reasons beyond any peculiarities in my own local box/environment, architecturally speaking should i expect the Spark would also be supported by release versions on that same Date X?” Or maybe the “integrated memory” might require more work for the very smart folks at Pytorch and Vllm, leaving this dummy (me) with a fancy new thing that doesn’t do anything (yet) for some period of time even though the 5070ti is working pretty seamlessly and well?
Wow that went all over the place. Sorry.
Quick recap: I’m seeking advice for AI/MK/etc n00bs who have crappy old computers. Should we buy one of the mini-AI-boxes coming out this year or do online services? And if we buy a new mini-AI-box, should we expect to wait some period of time for PyTorch & VLLM to support a new mini-AI-box (as with the Blackwell GPUs and, I bet, probably with all prior GPU architecture changes too)