Allen Institute for AI
Seattle, WA · $146,880 - 220,320
Lead Software Engineer, AI Infrastructure
Apply NowAbout the Role
You are a visionary leader who occupies the space between high-level software orchestration and low-level system performance. You are motivated by the idea that world-class infrastructure should be a catalyst for public good, not a proprietary secret. You understand that in the world of frontier AI, the software and the hardware are a single, inseparable organism.
Responsibilities
- Strategic Leadership: Develop the roadmap for managing large-scale HPC systems
- Full-Stack Ownership: Lead the design and delivery of critical systems from Beaker job scheduler to execution runtime
- System Automation: Build innovative tooling and software-defined infrastructure
- Performance Optimization: Conduct root-cause analysis on complex distributed system failures
- Mentorship & Culture: Foster a high-performance culture by reviewing code/design docs
Requirements
- 10+ years of professional experience developing business-critical software and operating large-scale compute infrastructure
- Deep Linux Expertise: Expert-level knowledge of Linux internals and container runtimes
- Distributed Systems Mastery: Designing, debugging, and optimizing high-scale distributed systems
- HPC Foundations: Experience with Kubernetes or Slurm and high-performance networking (NCCL and InfiniBand)
Benefits
- Medical, dental, vision coverage
- 401k plan
- $125/month commuting or internet expenses
- $200/month fitness and wellbeing expenses
- 20 vacation days, 10 sick days, 7 personal days