我的vllm日志信息如下,帮我组装vllm serve指令。
Initializing a V1 LLM engine (v0.11.0) with config: model=‘/mnt/workspace/model/base/Qwen3-VL-4B-Instruct’, speculative_config=None, tokenizer=‘/mnt/workspace/model/base/Qwen3-VL-4B-Instruct’, skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.bfloat16, max_seq_len=10240, download_dir=None, load_format=safetensors, tensor_parallel_size=1, pipeline_parallel_size=1, data_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto, device_config=cuda, structured_outputs_config=StructuredOutputsConfig(backend=‘auto’, disable_fallback=False, disable_any_whitespace=False, disable_additional_properties=False, reasoning_parser=‘’), observability_config=ObservabilityConfig(show_hidden_metrics_for_version=None, otlp_traces_endpoint=None, collect_detailed_traces=None), seed=0, served_model_name=/mnt/workspace/model/base/Qwen3-VL-4B-Instruct, enable_prefix_caching=False, chunked_prefill_enabled=True, pooler_config=None, compilation_config={“level”:3,“debug_dump_path”:“”,“cache_dir”:“”,“backend”:“”,“custom_ops”:,“splitting_ops”:[“vllm.unified_attention”,“vllm.unified_attention_with_output”,“vllm.mamba_mixer2”,“vllm.mamba_mixer”,“vllm.short_conv”,“vllm.linear_attention”,“vllm.plamo2_mamba_mixer”,“vllm.gdn_attention”,“vllm.sparse_attn_indexer”],“use_inductor”:true,“compile_sizes”:,“inductor_compile_config”:{“enable_auto_functionalized_v2”:false},“inductor_passes”:{},“cudagraph_mode”:[2,1],“use_cudagraph”:true,“cudagraph_num_of_warmups”:1,“cudagraph_capture_sizes”:[4,2,1],“cudagraph_copy_inputs”:false,“full_cuda_graph”:false,“use_inductor_graph_partition”:false,“pass_config”:{},“max_capture_size”:4,“local_cache_dir”:null}
vllm_kwargs: {‘dtype’: ‘bfloat16’, ‘gpu_memory_utilization’: 0.25, ‘kv_cache_dtype’: ‘auto’, ‘load_format’: ‘safetensors’, ‘max_model_len’: 10240, ‘max_num_seqs’: 3, ‘tensor_parallel_size’: 1, ‘block_size’: 16, ‘enable_chunked_prefill’: True, ‘enable_prefix_caching’: False, ‘limit_mm_per_prompt’: {‘image’: 1}, ‘mm_processor_kwargs’: {‘max_pixels’: 262144}}
non-default args: {‘load_format’: ‘safetensors’, ‘dtype’: ‘bfloat16’, ‘max_model_len’: 10240, ‘block_size’: 16, ‘enable_prefix_caching’: False, ‘gpu_memory_utilization’: 0.25, ‘max_num_seqs’: 3, ‘disable_log_stats’: True, ‘limit_mm_per_prompt’: {‘image’: 1}, ‘mm_processor_kwargs’: {‘max_pixels’: 262144}, ‘enable_chunked_prefill’: True, ‘model’: ‘/mnt/workspace/model/base/Qwen3-VL-4B-Instruct’}