A cover letter and resume are essential to help the hiring team understand your experience. In your cover letter, please explain how this role aligns with your career aspirations and skill set. The cover letter should be submitted as the first page of your resume. Submit both the cover letter and the resume in a single file.
Infrastructure & Cloud Architecture
- Lead the Michigan Online and CAI cloud infrastructure roadmap, including hands-on ownership of production AWS environments supporting customer-facing platforms at scale.
- Architect and evolve a cloud-native, auto-scaling infrastructure capable of supporting millions of global learners. Champion the shift toward immutability and self-healing systems.
- Design, implement, and operate platform-wide observability and monitoring systems (e.g., metrics, logging, tracing) for production services, including alerting, incident response, and post-incident remediation.
- Define and maintain standards for authentication, authorization, and API integrations.
Operational & Service Management
- Design and implement the Tier 2 incident and service management model, including SLAs, escalation paths, and operational readiness.
- Lead modernization of ticketing workflows across CAI technical teams, ensuring clear ownership, prioritization, and transparency.
Systems, Storage & Resilience
- Oversee enterprise storage, archival infrastructure, and data lifecycle management.
- Partner on cloud security, backup, disaster recovery, and resilience planning, including readiness and testing.
- Support capacity planning and cost-aware infrastructure decisions.
Team Leadership & Collaboration
- Lead and manage an infrastructure team, providing technical direction, coaching, and performance oversight while remaining hands-on in architecture, escalation, and reliability engineering.
- Serve as Infrastructure Squad Lead, accountable for roadmap execution and operational outcomes.
- Partner closely with TIS, DevOps, Data, and Learning Experience Designers to align infrastructure capabilities with platform and learning needs.
- Coordinate with ITS and external vendors on integrations, cloud services, and platform support
- Bachelor's degree in Computer Science, Information Systems, Computer Engineering, or a related field
- 7+ years of experience in infrastructure, platform engineering, or site reliability roles supporting production systems.
- 3+ years of experience formally managing infrastructure or platform engineering staff, including performance reviews, delivery accountability, and staff development.
- Demonstrated experience leading an infrastructure or platform team with formal responsibility for roadmap execution, backlog prioritization, and operational outcomes.
- Hands-on experience owning and operating containerized platforms (Kubernetes and/or OpenShift) in production environments, including deployment, scaling, reliability, and lifecycle management.
- Strong working knowledge of cloud infrastructure (AWS preferred), including networking, identity, and cost considerations.
- Proven ownership of production reliability, including monitoring, incident response, and continuous improvement.
- Experience with infrastructure automation and standardization (e.g., infrastructure as code, reusable patterns, self-service enablement).
- Practical experience partnering with security and identity teams to operationalize non-functional requirements (access control, vulnerability remediation, audit readiness).
- Familiarity with hybrid environments (cloud and on-prem) and their operational tradeoffs.
- Experience working in product-oriented or agile operating models, where infrastructure is delivered as a reusable platform.
- Strong communication skills, with the ability to translate technical risks and tradeoffs for non-technical stakeholders.
- Commitment to documentation, knowledge transfer, and building sustainable operating practices.
Candidates must have legal authorization to work in the United States.
The mode of work for this position is Hybrid with a minimum of 4 days in the office per week, Monday through Thursday, with an option of remote work on Fridays. On occasion, you may be required to work on-site on Fridays, as mandated by our center's policy and domain leadership, or by your job requirements.
The salary for this position will be based on the selected candidate's education and experience.
Excellent benefits are available. For details, see http://benefits.umich.edu/