Original listing text, shown exactly as published by the company.
In this role, you'll
- Lead the design and evolution of RKE2 Kubernetes clusters across development, acceptance, and production environments — driving upgrades, capacity planning, networking strategy, and operational resilience
- Architect and maintain infrastructure as code (Terraform) to provision and manage AWS-based environments, establishing patterns and modules that the team can scale with confidence
- Own the CI/CD platform (GitHub Actions) end-to-end — designing pipeline architecture, improving build performance, and ensuring releases flow reliably through staged environments
- Drive GitOps-based delivery strategy using Flux CD, defining standards for Helm charts and Kustomize overlays and ensuring consistent reconciliation across clusters
- Define container image lifecycle policies — building, signing, storing, and distributing images across OCI registries for multiple deployment targets
- Identify and eliminate operational toil through automation, improving environment provisioning, configuration management, and deployment processes at a systemic level
- Establish and maintain monitoring, alerting, and incident response practices — owning dashboards, runbooks, and post-incident reviews that raise platform reliability over time
- Serve as a technical point of contact for product service teams integrating into the deployment pipeline — unblocking teams, troubleshooting complex environment issues, and advocating for infrastructure best practices
- Own the security posture of the deployment platform — managing secrets, certificates, RBAC policies, and security configurations to meet compliance and operational security requirements
- Mentor and grow other engineers on the team through code review, pairing, design discussions, and knowledge sharing
We're looking for candidates who have
- Deep experience operating Kubernetes in production at scale — cluster lifecycle management, Helm, networking, persistent storage, performance tuning, and complex workload troubleshooting
- Expert-level proficiency with Terraform for provisioning and managing cloud infrastructure across multiple environments, including module design and state management strategies
- Strong Linux systems administration skills (RHEL or similar — networking, storage, systemd, performance analysis, shell scripting)
- Extensive experience designing and maintaining CI/CD pipelines (GitHub Actions, Jenkins, or similar), with a focus on reliability, speed, and developer experience
- Deep familiarity with AWS services commonly used in infrastructure (EC2, S3, VPC, IAM, EBS, Lambda, and related networking and security services)
- A strong operational mindset — you take ownership of reliability, think about failure modes proactively, and build automation to prevent recurring issues before they impact customers
Nice to have experience
- Experience with GitOps tools such as Flux CD or ArgoCD in complex, multi-environment setups — including designing promotion strategies and managing drift
- Familiarity with container registry operations, image distribution, OCI artifact management, and image signing or verification workflows
- Experience managing databases in Kubernetes environments (e.g., PostgreSQL operators, stateful workloads, backup and recovery strategies)
- Familiarity with Kafka or other message brokers in a containerized environment, including operational management and scaling
- Experience with configuration management or automation tools (Ansible or similar) at fleet scale
- Advanced scripting proficiency in Python for infrastructure automation, tooling, and integration work
- Experience supporting environments with strict security or compliance requirements (FedRAMP, FIPS, IL environments, or similar)
- Exposure to or passion for the cryptocurrency technology ecosystem
Technologies we use
- Kubernetes: RKE2, Helm, Kustomize, Flux CD
- Cloud provider: AWS (EC2, S3, EBS, VPC, IAM, Lambda)
- Infrastructure as code: Terraform
- CI/CD: GitHub Actions, semantic-release
- Monitoring and alerting: Prometheus, Grafana, Datadog, Humio
- Scripting and automation: Shell, Python
- Container registries: OCI/Docker
- Database management: PostgreSQL
- Event streaming: Kafka
- Operating systems: RHEL, Rocky Linux, Amazon Linux
AI at Chainalysis
AI is not a feature at Chainalysis - it is a new way of working. One that turns instructions into work done, and helps us move faster than the threats we're built to counter, and we expect our employees to take ownership of the output and ensure quality. As the world's most trusted blockchain analytics platform, Chainalysis sits at a rare intersection of proprietary data, regulatory relationships and crypto expertise that makes it uniquely placed to shape and lead the next era of AI-driven intelligence - and we expect everyone here, regardless of role, to be an active part of it.
AI fluency is tied directly to how we measure performance and how we plan to win. There is no substitute for your own curiosity. We provide the tools, workflows, and space to experiment - but the expectation is that you develop these capabilities yourself, bring ideas, and collaborate across teams to reinvent the way work gets done. We are not using AI to do less. We are using it to do what was never possible before.