Can Support Qwen3-VL or Qwen2.5 VL 72B on Vllm-ascend 0.11.0?

yukikawa · November 6, 2025, 4:50pm

Can Support Qwen3-VL 8B or Qwen2.5 VL 72B on Vllm-ascend 0.11.0 docker container?

system response following:
5_VLForConditionalGeneration.
(EngineCore_DP0 pid=26554) WARNING 11-06 16:44:53 [registry.py:581] Model architecture DeepseekV2ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV2ForCausalLM.
(EngineCore_DP0 pid=26554) WARNING 11-06 16:44:53 [registry.py:581] Model architecture DeepseekV3ForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_v2:CustomDeepseekV3ForCausalLM.
(EngineCore_DP0 pid=26554) WARNING 11-06 16:44:53 [registry.py:581] Model architecture DeepSeekMTPModel is already registered, and will be overwritten by the new model class vllm_ascend.models.deepseek_mtp:CustomDeepSeekMTP.
(EngineCore_DP0 pid=26554) WARNING 11-06 16:44:53 [registry.py:581] Model architecture Qwen3MoeForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_moe:CustomQwen3MoeForCausalLM.
(EngineCore_DP0 pid=26554) WARNING 11-06 16:44:53 [registry.py:581] Model architecture Qwen3NextForCausalLM is already registered, and will be overwritten by the new model class vllm_ascend.models.qwen3_next:CustomQwen3NextForCausalLM.
(EngineCore_DP0 pid=26554) INFO 11-06 16:44:53 [core.py:77] Initializing a V1 LLM engine (v0.11.0rc3) with config: model=‘/data1/Qwen3_vl_8B_instruct’, speculative_config=None, tokenizer=‘/data1/Qwen3_vl_8B_instruct’, skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.bfloat16, max_seq_len=262144, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=npu, structured_outputs_config=StructuredOutputsConfig(backend=‘auto’, disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=‘’), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=42, served_model_name=/data1/Qwen3_vl_8B_instruct, enable_prefix_caching=True, chunked_prefill_enabled=True, pooler_config=None, compilation_config={“level”:3,“debug_dump_path”:“”,“cache_dir”:“”,“backend”:“”,“custom_ops”:[“all”],“splitting_ops”:[“vllm.unified_attention”,“vllm.unified_attention_with_output”,“vllm.mamba_mixer2”,“vllm.mamba_mixer”,“vllm.short_conv”,“vllm.linear_attention”,“vllm.plamo2_mamba_mixer”,“vllm.gdn_attention”,“vllm.unified_ascend_attention_with_output”,“vllm.mla_forward”],“use_inductor”:false,“compile_sizes”:,“inductor_compile_config”:{“enable_auto_functionalized_v2”:false},“inductor_passes”:{},“cudagraph_mode”:1,“use_cudagraph”:true,“cudagraph_num_of_warmups”:1,“cudagraph_capture_sizes”:[512,504,488,480,464,456,448,432,424,408,400,392,376,368,352,344,336,320,312,296,288,280,264,256,240,232,216,208,200,184,176,160,152,144,128,120,104,96,88,72,64,48,40,32,16,8,2,1],“cudagraph_copy_inputs”:false,“full_cuda_graph”:false,“use_inductor_graph_partition”:false,“pass_config”:{},“max_capture_size”:512,“local_cache_dir”:null}
[INFO:swift] Successfully registered /usr/local/python3.11.13/lib/python3.11/site-packages/swift/llm/dataset/data/dataset_info.json.
(EngineCore_DP0 pid=26554) INFO 11-06 16:45:14 [parallel_state.py:1208] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0, EP rank 0
Traceback (most recent call last):
File “/usr/local/python3.11.13/lib/python3.11/site-packages/swift/cli/infer.py”, line 5, in
infer_main()
File “/usr/local/python3.11.13/lib/python3.11/site-packages/swift/llm/infer/infer.py”, line 301, in infer_main
return SwiftInfer(args).main()
^^^^^^^^^^^^^^^^
File “/usr/local/python3.11.13/lib/python3.11/site-packages/swift/llm/infer/infer.py”, line 39, in init
self.infer_engine = self.get_infer_engine(args, self.template)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/python3.11.13/lib/python3.11/site-packages/swift/llm/infer/infer.py”, line 86, in get_infer_engine
return infer_engine_cls(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/python3.11.13/lib/python3.11/site-packages/swift/llm/infer/infer_engine/vllm_engine.py”, line 140, in init
self._prepare_engine()
File “/usr/local/python3.11.13/lib/python3.11/site-packages/swift/llm/infer/infer_engine/vllm_engine.py”, line 150, in _prepare_engine
engine = llm_engine_cls.from_engine_args(self.engine_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py”, line 177, in from_engine_args
return cls(vllm_config=vllm_config,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/vllm-workspace/vllm/vllm/v1/engine/llm_engine.py”, line 114, in init
self.engine_core = EngineCoreClient.make_client(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/vllm-workspace/vllm/vllm/v1/engine/core_client.py”, line 80, in make_client
return SyncMPClient(vllm_config, executor_class, log_stats)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/vllm-workspace/vllm/vllm/v1/engine/core_client.py”, line 602, in init
super().init(
File “/vllm-workspace/vllm/vllm/v1/engine/core_client.py”, line 448, in init
with launch_core_engines(vllm_config, executor_class,
File “/usr/local/python3.11.13/lib/python3.11/contextlib.py”, line 144, in exit
next(self.gen)
File “/vllm-workspace/vllm/vllm/v1/engine/utils.py”, line 732, in launch_core_engines
wait_for_engine_startup(
File “/vllm-workspace/vllm/vllm/v1/engine/utils.py”, line 785, in wait_for_engine_startup
raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
[ERROR] TBE Subprocess[task_distribute] raise error, main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error, main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error, main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error, main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error, main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error, main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error, main process disappeared!
[ERROR] TBE Subprocess[task_distribute] raise error, main process disappeared!
[ERROR] 2025-11-06-16:50:34 (PID:26188, Device:0, RankID:-1) ERR99999 UNKNOWN applicaiton exception
/usr/local/python3.11.13/lib/python3.11/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 30 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d ’

RunLLM · November 6, 2025, 4:50pm

Qwen3-VL-8B and Qwen2.5-VL-72B are officially supported by vLLM, as shown in the supported models documentation. However, there is no mention in the provided data sources of vllm-ascend or Ascend hardware support—vLLM is documented and tested primarily on CUDA-based (NVIDIA) and ROCm (AMD) platforms. The official Docker images and recipes are for CUDA/ROCm, and there is no explicit reference to vllm-ascend or Ascend-specific containers or compatibility.

Therefore, while these models are supported on standard vLLM (CUDA/ROCm) containers, support for Qwen3-VL-8B or Qwen2.5-VL-72B on a vllm-ascend 0.11.0 Docker container is not documented or guaranteed in the provided sources. Would you like more detail on supported hardware or deployment steps?

Sources:

Supported Models - vLLM

Hint: Mention @RunLLM in the post for followups.

wangxiyuan · November 20, 2025, 7:33am

yes, Qwen3-VL 8B or Qwen2.5 VL 72B works with 0.11.0rc1.

The error you point here looks from swift lib. Can you describe more that how you use vllm-ascend? I’m not quite sure swift works well with vllm-ascend.

Topic		Replies	Views
How can vllm ascend support qwen3-vl-235b? Ascend Support	2	281	October 16, 2025
我能在Ascend310B芯片上通过vllm-ascend插件部署Qwen2.5-vl吗？ Ascend Support	3	200	October 15, 2025
Question about vLLM and vLLM Ascend verisoning policy Ascend Support	4	346	April 1, 2025
最新开源的Qwen3.6的moe模型，vllm-ascend支持吗？ Ascend Support	1	261	April 17, 2026
Question about vllm-ascend performance on server with 8*910B3 Ascend Support	5	684	May 28, 2025

Can Support Qwen3-VL or Qwen2.5 VL 72B on Vllm-ascend 0.11.0?

Related topics