Has anyone successfully deploy deepseek-v4-flash on 8xL40s?
There is no evidence in the provided data sources that DeepSeek-V4-Flash has been successfully deployed on 8xL40s GPUs. In fact, several issues report failures or lack of support for similar Ada architecture GPUs (e.g., L20, A800, RTX 4090), mainly due to missing kernel support in DeepGEMM and other dependencies required by DeepSeek V4 Flash. The official roadmap and user discussions indicate that only Hopper (H100/H200), Blackwell (B200), and some ROCm/AMD setups are supported, while Ada (L40s, L20, A800, etc.) is not currently supported for DeepSeek-V4-Flash in vLLM. Attempts to run on Ada GPUs result in errors related to unsupported architectures or missing kernels, and there are no reports of successful deployment on L40s in the referenced issues or documentation.
Would you like more detail on the specific errors or alternative supported hardware?
Sources:
Hint: Mention @RunLLM in the post for followups.