Hello All,
We’re actively looking for ML Infrastructure experts with deep hands-on experience in:
- CUDA and GPU-level optimization
- LLM inference (serving, latency tuning, KV cache optimization)
- vLLM, TensorRT-LLM, FlashAttention, Exllama, etc.
- Focused strictly on language models — no vision or audio
 Ideal candidates are fluent in large-scale model deployment and low-latency inference, with a passion for optimizing performance at the token level.
 Ideal candidates are fluent in large-scale model deployment and low-latency inference, with a passion for optimizing performance at the token level.
 Bay Area preferred (hybrid roles available)
 Bay Area preferred (hybrid roles available)
 Interested? Reach out at mahesh.muthusamy@phizenix.com
 Interested? Reach out at mahesh.muthusamy@phizenix.com
 Immediate hire — please share or refer if you know someone great!
 Immediate hire — please share or refer if you know someone great!
Thanks,
Mahesh
Phizenix Inc. | WBENC & WOSB Certified | Bay Area