Original listing text, shown exactly as published by the company.
Who You Are
- Seniority: Minimum of 5 years dedicated experience in DevOps, Infrastructure, or SRE roles. Expert tooling: expert with Docker, Kubernetes (k8s), and Terraform/Pulumi.
- Cloud proficiency: Deep, proven expertise in either AWS or GCP infrastructure, with the ability to quickly grasp and transition to other cloud providers.
- Development skills: Strong ability to write clean, maintainable code for automation in Go, Python, or Node.js.
- Security focus: Demonstrable experience implementing and maintaining modern cloud security controls and meeting key compliance standards (SOC 2, PIPEDA, HIPAA, and/or GDPR).
- Independent, proactive, and cross-functional: Proven ability to quickly onboard, diagnose problems, and propose and implement solutions with minimal oversight. Experienced in a consultant or freelancer capacity, with the ability to understand and communicate effectively with both technical and non-technical stakeholders.
What You'll Do
- Infrastructure as Code (IaC): Quickly implement and adapt infrastructure using Terraform, Pulumi, or other major IaC tools.
- Containers: Docker is critical. Deeply understand how to design, build, and optimize secure, multi-stage Dockerfiles.
- CI/CD: Design, build, and manage robust CI/CD pipelines to automate testing, building, and deployment across environments.
- Core cloud services (AWS or GCP): Provision and manage foundational services. Deep expertise in one major provider is required, transferable to the other.
- Container compute: Expertise in at least one major container platform: EKS, GKE, ECS, Fargate, or Cloud Run. (Kubernetes is highly valued, particularly EKS or GKE.)
- Networking: Know when to use load balancers, VPNs for secure connectivity, and private VPCs for isolation. Apply subnetting, routing, VPC peering, and NAT gateways to build secure systems.
- Storage: S3 (AWS) or Cloud Storage (GCP).
- Databases: RDS (AWS) or CloudSQL (GCP).
- Serverless: Deploy event-driven components using AWS Lambda, GCP Cloud Functions, or equivalents.
- CDNs and message queues.
- Security: Protect PII; apply encryption, secrets management, network firewalls, and web application firewalls (AWS WAF, GCP Cloud Armor) following security best practices.
- Automation and scripting: Write high-quality automation and tooling in Go, Python, Node.js, or Bash for client-specific operational challenges.
- Monitoring and operations: Ensure robust monitoring and high system uptime.
Nice to Have:The following would be a bonus experience to have, though highlight any additional experience or skills you may have. We like working with people with varied backgrounds and experiences.
- Production AI/agent experience: Hands-on experience running LLM or agent systems in production, including how they fail differently from deterministic services: nondeterministic outputs that break conventional testing and alerting, runaway token and inference cost, and partial failures on multi-step chains.
- AI observability and cost control: Tracing multi-step agent runs, treating token cost, latency, and output quality as first-class metrics, and keeping inference spend in check with budgets, rate limiting, and caching (Langfuse, LangSmith, Arize, or similar).
- The infrastructure AI systems run on: Model gateways and provider routing with failover (LiteLLM, Bedrock, Vertex), durable execution for long-running multi-step workflows (Temporal, Step Functions, Inngest), eval and regression pipelines for prompt or model changes, and the retrieval, vector-store, and context plumbing these systems depend on (including MCP). Vector databases and GPU/TPU compute where relevant.
- Domain experience in fintech or crypto/web3 environments.
- Crypto/web3 infrastructure: running nodes (Ethereum, Solana, or others), indexing solutions (The Graph, custom indexers), or RPC infrastructure.
- Payment processing, ledger architecture, or financial transaction systems, and meeting compliance requirements in regulated environments.
- High-volume, mission-critical systems: real-time data flows, websocket feeds, payment rails, or distributed architectures handling millions of transactions.
- Certifications: AWS or GCP cloud certifications are a plus, not mandatory.
- Advanced monitoring (Prometheus, Datadog) or logging experience.
AI in our hiring process
We use AI to help us review and shortlist applications based on job-related criteria. A human hiring manager always makes the call on who moves forward. As a company that builds with AI every day, we're all for candidates using it too — just be upfront about how it helped.