A hybrid DevOps & Infrastructure role at Level AI.
How Sydicom helps: we read this listing’s requirements and tune your CV and cover letter to the keywords its ATS (Lever) is scanning for, wherever you are, then help you apply.
Original listing text, shown exactly as published by the company.
The Senior SRE will be positioned at the intersection of backend engineering, infrastructure operations, and FinOps. The role is explicitly broader than a traditional DevOps engineer and explicitly more hands-on than a pure architect.
Infrastructure cost efficiency and FinOps. Own the continued reduction of Kubernetes overprovisioning, drive right-sizing programs, and maintain the cost telemetry that backend teams use to make decisions.
GPU throughput optimization. Run a structured experimentation program on on-premise GPU clusters, partnering with AI service owners. Lead by the Engineering leadership, with this role providing the experimental bandwidth.
Backend enablement, not ownership absorption. Build the tooling, dashboards, and processes that let backend teams from other groups own their own cost and reliability budgets. The deliverable is leverage, not headcount-shaped work.
Reliability instrumentation. As the infra team owns most of the instrumentation across new and offline flows, this role takes a central seat in making sure that surface area is captured properly for both cost-at-scale and reliability.
Selective security workstreams. Take on a defined slice of the active security work so that senior DevOps engineers are not the single point of execution for security-adjacent platform changes.
This role explicitly requires 4-5 years of hands-on systems experience. We are not looking for someone who will lean entirely on AI tooling to discover what to do; we are looking for someone who already knows what to ask, and can use AI tooling as a force multiplier on top of that judgement.
Backend engineering depth: production experience in Python, Go/Rust, comfortable owning services end to end, able to read and reason about backend code across teams.
Kubernetes at scale: scheduler behavior, resource requests/limits, HPA/VPA, node pool design, cost-aware autoscaling (Cast AI, Karpenter, or equivalent).
Cloud and on-premise infrastructure: GCP fluency, IaC (Terraform), CI/CD, and comfort operating in hy brid setups including on-prem GPU clusters.
GPU workload understanding: familiarity with throughput profiling, batching, KV-cache behavior, inference server tuning, and GPU utilization metrics.
Observability and reliability: metrics, traces, logs, SLOs, and the discipline to instrument systems properly rather than reactively.
FinOps mindset: demonstrated history of converting infrastructure choices into measurable cost outcomes.
Security baseline: able to take on platform-security workstreams without requiring constant handoff to the DevOps team.
Level AI
DevOps & Infrastructure
22 open roles on Sydicom
Level AI develops artificial intelligence solutions designed to enhance customer service operations. Their platform provides AI-powered tools for contact centers, focusing on real-time agent assistance, conversational intelligence, and automated quality assurance.
Generated by Sydicom AI