Hiring: ML Infrastructure Engineer (LLM Inference Focus) | Bay Area Preferred

Hello All,

We’re actively looking for ML Infrastructure experts with deep hands-on experience in:

  • CUDA and GPU-level optimization
  • LLM inference (serving, latency tuning, KV cache optimization)
  • vLLM, TensorRT-LLM, FlashAttention, Exllama, etc.
  • Focused strictly on language models — no vision or audio

:wrench: Ideal candidates are fluent in large-scale model deployment and low-latency inference, with a passion for optimizing performance at the token level.

:round_pushpin: Bay Area preferred (hybrid roles available)
:envelope_with_arrow: Interested? Reach out at mahesh.muthusamy@phizenix.com
:stopwatch: Immediate hire — please share or refer if you know someone great!

Thanks,
Mahesh
Phizenix Inc. | WBENC & WOSB Certified | Bay Area