Deployment parameters for qwen3.5-4b?

hackyroot · March 24, 2026, 7:05pm

@hustxiayang The deployment parameters depend on the metric you’re targeting. Are you focusing on TTFT, end-to-end latency or something else?

DamodarYekkuluri · March 30, 2026, 3:26am

@hustxiayang what your hardware configuration? Is it H100 or something else?

Oxi84 · April 19, 2026, 2:50am

How did you actually made qwen 3.5 - 4b to work at all. Does not work for me because of some qwen_35_text architecture, and a few more params in config. The same error as here - Can not deploy SFT Qwen3.5-9B model · Issue #44541 · huggingface/transformers · GitHub

Topic		Replies	Views
Need to serve a Qwen3 LLM with 235B params General	9	802	March 7, 2026
Critique my vLLM configuration for qwen3-coder-next Model Support	3	595	March 10, 2026
Hosting Qwen 3.5 35B-A3B model Model Support	1	1345	February 25, 2026
Deployment example for a qwen3 model with hybrid thinking Reasoning Outputs	10	2377	February 4, 2026
Qwen3.5-27B-FP8 Speculative Decoding Speculative Decoding	2	2325	April 11, 2026