Original listing text, shown exactly as published by the company.
What you'll be doing day to day
- Design, implement, and operate cloud-native infrastructure for production workloads.
- Build and maintain CI/CD pipelines for applications and ML systems.
- Develop and manage infrastructure-as-code to ensure consistency, scalability, and repeatability.
- Improve deployment reliability through automation, testing, and rollout strategies.
- Support AI/ML workloads, including model deployment, scaling, and monitoring.
- Integrate AI-assisted tools into DevOps workflows to improve:
- Alert quality and noise reduction
- Incident triage and root-cause analysis
- Capacity planning and performance optimization
- Design and maintain observability solutions using metrics, logs, and traces.
- Participate in on-call rotations and lead incident response for complex issues.
- Continuously improve system resilience through proactive automation.
- Partner with application, platform, and data teams to deliver reliable systems.
- Provide technical guidance and code reviews for devops team members.
- Contribute to architecture discussions and influence technical direction within your domain.
- Document standards, runbooks, and best practices.
- Participate in on-call rotations
What you'll bring
- Strong Hands-On Expertise
- Ability to own systems from design through production support. Design and setup end to end CI/CD Pipelines.
- Practical understanding of distributed systems and cloud infrastructure.
- Bias toward automation, observability, and operational excellence.
- Growth-Oriented Leadership
- Willingness to mentor and support teammates.
- Comfort collaborating across functions and advocating for better practices.
- Ability to communicate trade-offs clearly and constructively.
- Strong knowledge of containerization and orchestration (Docker, Kubernetes).
- Experience with monitoring and observability tools (Prometheus, Grafana, ELK, Datadog).
- Solid understanding of networking, security principles, and high-availability architectures.
- Strong problem-solving skills and ability to work in complex, distributed environments.
Skills and experience we’re looking for
Must-haves
- 5-8+ years of progressive experience designing and implementing large-scale, automated infrastructure and CI/CD Pipelines
- Strong experience with cloud platforms (AWS, GCP, or Azure).
- Hands-on expertise in:
- cloud-native architectures and microservices.
- Infrastructure as Code (Terraform, IaC 2.0, etc.)
- Containers and Kubernetes
- CI/CD systems (GitHub Actions, GitLab CI, Jenkins, ArgoCD)
- Strong knowledge-level of CI/CD best practices, GitOps workflows and deployment strategies (blue/green, canary)
- Proficiency in scripting or programming (Python, Bash, Go, etc.).
- Demonstrated ability to write custom scripts for deployment, monitoring and system tasks.
- Experience with monitoring, logging, and alerting tools.
- Exposure to AI/ML workloads.
Nice-to-haves
-3+ years in healthcare technology or other highly regulated SaaS environments (financial services, government
- Experience with AI-driven monitoring or AIOps tools
- Knowledge of cloud cost optimization strategies.
- Advanced certifications: AWS Certified DevOps Engineer-Professional, Microsoft Certified DevOps Engineer Expert, Certified Kubernetes Administrator (CKA)
- Experience with security automation and DevSecOps practices.
Education
- 3+ years in healthcare technology or other highly regulated SaaS environments (financial services, government)
- Experience with AI-driven monitoring or AIOps tools.
- Knowledge of cloud cost optimization strategies.
- Advanced certifications: AWS Certified DevOps Engineer-Professional, Microsoft Certified DevOps Engineer Expert, Certified Kubernetes Administrator (CKA)
- Experience with security automation and DevSecOps practices.