@hustxiayang The deployment parameters depend on the metric you’re targeting. Are you focusing on TTFT, end-to-end latency or something else?
@hustxiayang what your hardware configuration? Is it H100 or something else?
How did you actually made qwen 3.5 - 4b to work at all. Does not work for me because of some qwen_35_text architecture, and a few more params in config. The same error as here - Can not deploy SFT Qwen3.5-9B model · Issue #44541 · huggingface/transformers · GitHub