The message flashinfer.jit: [Autotuner]: OOM detected, falling back to default tactic means FlashInfer’s kernel autotuner ran out of GPU memory while benchmarking optimized kernels, so it reverted to a less memory-intensive (but possibly slower) default kernel. This is not a fatal error—serving will continue, but you may see lower throughput or higher latency. To avoid this, reduce --max-num-batched-tokens, lower --gpu-memory-utilization, or use a smaller --max-model-len to free up memory for autotuning, as recommended in the Gemma 4 usage guide and memory troubleshooting docs.
Would you like more detail on how to tune these parameters for your hardware?