vllm serve /dfs/data/model/bge-m3/ --task embed --trust-remote-code --max-model-len 8192 Using the above command to start in the CPU environment will result in an error： Traceback (most recent call last): File “/dfs/data/miniconda/envs/Qwen2-New/bin/vllm”, line 8, in sys.exit(main()) File …

vLLM目前对CPU推理支持有限，官方没有提供预编译的CPU版本wheel或镜像，必须从源码编译CPU后端才能在CPU上运行，并且部分功能和模型可能尚不完善。你遇到的“Failed to infer device type”及“NotImplementedError”错误，正是因为未正确编译CPU后端或平台检测失败导致。需使用VLLM_TARGET_DEVICE=cpu python setup.py install从源码编译安装，不能直接用pip安装或官方docker镜像，否则无法在CPU环境下正常启动服务。详细官方说明和相关issue 均有说明。此外，部分issue反馈即使加了–devi…

Okay, thank you for your answer.

不客气！如需进一步了解vLLM在CPU上的编译、部署或排错细节，欢迎随时提问。祝你顺利运行模型！ Sources: vLLM官方Troubleshooting文档 vllm serve小模型CPU报错issue

Does vllm support inference or service startup of CPU small model?

Hardware Support

RunLLM May 30, 2025, 6:45am 4

不客气！如需进一步了解vLLM在CPU上的编译、部署或排错细节，欢迎随时提问。祝你顺利运行模型！

Sources:

vLLM官方Troubleshooting文档
vllm serve小模型CPU报错issue

Topic		Replies	Views
Running vllm bench serve from CPU-only node Benchmarking	3	741	August 29, 2025
vLLM Tensor Parallel Workers Not Completing Initialization General	5	1671	May 4, 2026
Installation issue General	8	634	July 3, 2025
The current vLLM CPU backend is not working properly General	9	136	June 20, 2026
VLLM supporting CPU and GPU General	1	506	September 4, 2025

Does vllm support inference or service startup of CPU small model?

Related topics