Original listing text, shown exactly as published by the company.
About the role
We're hiring a Senior Backend Engineer to own AI features end to end, from rapid prototype to production and the evaluation that keeps them honest. You'll build the APIs, tool-using agents, and RAG pipelines that turn frontier LLMs into grant discovery, application drafting, and research tools our 5,500+ nonprofits rely on every day. It's a high-ownership seat on a small team, where what you ship reaches customers fast and you help shape how we build AI here.
What you'll do
Ship AI to production
- Build tool-using LLM agents (task planning, function and tool calling, multi-step workflows, guardrails) for grant discovery, application drafting, and research assistance.
- Turn prototypes into resilient, observable services with clear SLAs, rollback and fallback strategies, and cost and latency budgets.
- Stand up evaluation and observability so our AI stays grounded, safe, and cost-effective.
Build trustworthy backends
- Write high-quality, thoroughly tested code across the backend and the data pipelines that power retrieval and evaluation.
- Contribute to reliability practices: alerts, dashboards, and incident response.
Collaborate and raise the bar
- Partner with Product, Design, and GTM on scoping, UX, and measurement.
- Run experiments (A/B, canaries), interpret results, and iterate.
- Raise engineering standards through clear, maintainable code, tests, docs, and thoughtful review.
What we're looking for
Required
- 7+ years building and shipping production backend systems in Python (FastAPI, Celery, or equivalent), taking features from prototype to production with real reliability practices like tests, observability, and rollback.
- Hands-on experience building LLM features in production: tool and function calling, multi-step agent workflows, and the guardrails and evals that keep them grounded, safe, and cost-effective. This is the core of the role.
- Strong data fundamentals: SQL, schema design, and building pipelines that power retrieval and evaluation.
- Thrives in a fast, scrappy startup environment with high ownership and a bias for action, speed, quality, and simplicity.
Nice to have
- TypeScript and Node, plus familiarity with Ruby on Rails (our core platform) or a willingness to learn it.
- Experience with AWS or GCP, Docker, CI/CD, and observability (logs, metrics, traces).
- RAG depth: document ingestion, chunking and windowing, embeddings, hybrid search (keyword plus vector), re-ranking, and grounded citations.
- Experience with re-rankers and cross-encoders, hybrid retrieval tuning, or search and recommendation systems.
- Evaluation mindset: designing eval suites (RAG/QA, extraction, summarization) using automated and human-in-the-loop methods, with familiarity with frameworks like Ragas, DeepEval, or OpenAI Evals.
- Orchestration frameworks: LangChain or LangGraph, LlamaIndex, Semantic Kernel, or custom orchestration.