Original listing text, shown exactly as published by the company.
WHAT YOU'LL DO
- AI-Agent Architecture and Governance: Design foundational patterns and guardrails for how EarnIn builds, evaluates, monitors, and deploys AI agents in production. Own agent governance, including model selection, evaluation frameworks, safety guidelines, and production observability. Establish infrastructure-as-code best practices for agentic systems, ensuring prompts, tools, and evaluation criteria are versioned, reviewed, and tested like critical components.
- Strategic Leadership: Serve as architect in agentic cloud infrastructure, establishing best practices for production AI agents. Mentor senior engineers in advanced agentic patterns, LLM integration, and production prompt engineering. Lead cross-functional initiatives with engineering, product, security, and business teams to align agentic AI adoption with company objectives.
- Platform Operations: Oversee large-scale, high-availability distributed systems on AWS, identifying and solving critical performance, scalability, and stability challenges. Use AI-driven observability and anomaly detection to anticipate failures. Lead the evolution of infrastructure-as-code and automation standards, incorporating agentic pattern recognition and automated remediation into operations.
- Developer Platform: Shape the evolution of our developer control plane (Cortex) as an AI-augmented self-service platform where engineers interact with intelligent assistants to scaffold services, debug deployments, and resolve issues through natural language. Drive AI-powered golden paths that encode platform standards, security policies, and best practices.
- Organizational Impact: Act as liaison between cloud operations, AI infrastructure, and business stakeholders. Develop documentation on agentic architecture, best practices, and operational procedures. Participate in and lead on-call rotations, using post-mortems as feedback loops for improving system reliability and agentic automation.
WHAT WE'RE LOOKING FOR
- Bachelor's or Master's degree in Computer Science, Engineering, or related field.
- 7+ years of experience in cloud infrastructure, managing large-scale, high-availability, customer-facing distributed systems.
- Proven experience mentoring senior engineers and leading company-wide platform initiatives across multiple teams and functions.
- Demonstrated experience architecting and scaling AI-driven systems in production, designing multi-step agentic workflows that autonomously perform complex operational tasks.
- Track record of eliminating high-friction operational workflows through agentic AI, with measurable reduction in toil and increased platform leverage (e.g., LLM-powered incident diagnosis, intelligent CI/CD with test selection and deployment risk scoring, self-service assistants).
- Mastery of AWS (EKS, Lambda, Bedrock, etc.) and deep expertise in containerized and serverless architectures.
- Strong expertise in Kubernetes at scale and ability to guide implementation of complex, resilient solutions.
- Deep knowledge of infrastructure-as-code tools (Terraform, Ansible) and ability to lead initiatives incorporating both traditional IaC and agentic automation.
- Mastery of Datadog and advanced observability, driving metrics-driven decisions and agentic automation. Experience building AI-driven alerting and root-cause analysis systems is a plus.
- Strong adherence to security, privacy, and compliance best practices, with the ability to lead governance for production AI systems (model safety, prompt injection prevention, data isolation).
- Experience with LLM orchestration frameworks (LangChain, LlamaIndex, CrewAI, or custom agentic architectures) and production prompt engineering at scale.
- Strong coding expertise in Python and/or Go, with the ability to guide teams in treating infrastructure and agentic systems as software.
- Proven ability to drive cross-functional initiatives across engineering, product, security, and business, translating between technical depth and business impact.
- Experience using AI-assisted development tools (e.g., GitHub Copilot, Cursor, ChatGPT, or similar tools) as part of your software development workflow?
- Experience with service mesh (Linkerd, Istio) and traffic management at scale is a plus.
- Proficiency with GitOps (Argo CD, Flux CD) and CI/CD orchestration (GitHub Actions, Argo Workflows) is a plus.
- Experience with MLOps or LLMOps concepts (model versioning, evaluation frameworks, production monitoring for AI systems) is a plus.
- Familiarity with security frameworks relevant to AI systems (e.g., guardrails, audit logging, and data governance for LLMs) is a plus.
#LI-Remote
At EarnIn, we believe that the best way to build a financial system that works for everyday people is by hiring a team that represents our diverse community. Our team is diverse not only in background and experience but also in perspective. We celebrate our diversity and strive to create a culture of belonging. EarnIn does not unlawfully discriminate based on race, color, religion, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), gender identity, gender expression, national origin, ancestry, citizenship, age, physical or mental disability, legally protected medical condition, family care status, military or veteran status, marital status, registered domestic partner status, sexual orientation, genetic information, or any other basis protected by local, state, or federal laws. EarnIn is an E-Verify participant.
EarnIn does not accept unsolicited resumes from individual recruiters or third-party recruiting agencies in response to job postings. No fee will be paid to third parties who submit unsolicited candidates directly to our hiring managers or HR team.