Hiring: ML Infrastructure Engineer (LLM Inference Focus) | Bay Area Preferred

muthusamyphizenix · June 20, 2025, 4:41pm

Hello All,

We’re actively looking for ML Infrastructure experts with deep hands-on experience in:

CUDA and GPU-level optimization
LLM inference (serving, latency tuning, KV cache optimization)
vLLM, TensorRT-LLM, FlashAttention, Exllama, etc.
Focused strictly on language models — no vision or audio

Ideal candidates are fluent in large-scale model deployment and low-latency inference, with a passion for optimizing performance at the token level.

Bay Area preferred (hybrid roles available)
Interested? Reach out at mahesh.muthusamy@phizenix.com
Immediate hire — please share or refer if you know someone great!

Thanks,
Mahesh
Phizenix Inc. | WBENC & WOSB Certified | Bay Area

Topic		Replies	Views
Hiring: LLM inference software engineer - NVIDIA Jobs	0	69	September 29, 2025
vLLM Load balancing General	1	472	March 24, 2025
Offline multi-node inference General	11	379	May 19, 2025
Speeding up vllm inference for Qwen2.5-VL General	23	5204	June 27, 2025
RunBot's math-to-text on NVIDIA NeMo Framework AutoModel LoRA	11	55	May 19, 2025

Hiring: ML Infrastructure Engineer (LLM Inference Focus) | Bay Area Preferred

Related topics