Original listing text, shown exactly as published by the company.
Competencies/Requirements
- 5+ years building production software, with strong Python.
- Hands-on experience building LLM-powered applications or agents, tool use / function calling, structured outputs, multi-step orchestration, and the glue that makes it all hold together.
- A track record of making LLMs reliable in production, you've wrestled nondeterminism, designed around model limitations, and shipped something that worked when it mattered.
- Real experience with evaluation: you've built or owned the harness that tells you whether a model or agent change is an improvement, not just a vibe.
- Strong instincts for prompt and context engineering, and the judgment to keep the model's job small and well-scoped.
- Solid software fundamentals — testing, observability, and the discipline to keep a complex agent debuggable.
- Ownership mentality, comfortable owning a critical, fast-moving subsystem end to end.
Desired/Nice to Have
- Working knowledge of web application security, broken access control, IDOR/BOLA, SQLi, XSS, SSRF, SSTI, enough to collaborate fluently with offensive engineers.
- Experience building eval harnesses or benchmarks specifically for agents (synthetic environments, CVE-based test targets, capture-the-flag-style scoring).
- Experience with agent frameworks, and strong opinions about when not to reach for one.
- Familiarity with graph data models (e.g., Neo4j) for representing application state and attack context.
What makes you stand out
- You've shipped an autonomous agent that did real, valuable work unattended in production, and you have scar tissue from making it trustworthy.
- You've designed evaluation systems that actually drove improvement, closed the loop between "we changed something" and "it measurably got better."
- You pair an offensive-security mindset (CTF, bug bounty, pentesting, or research background) with the engineering chops to turn that intuition into a reliable system.
- You have hands-on experience with agent fine-tuning or RL (SFT, GRPO, reward design for tool-using agents) and a grounded view of when it's worth it versus improving the harness.
- You've published or spoken on agent reliability, evaluation, or autonomous security tooling.
Other Duties
Please note this job description is not designed to cover or contain a comprehensive listing of activities, duties or responsibilities that are required of the employee. Duties, responsibilities, and activities may change at any time with or without notice.
Application Note
In any materials you submit, you may redact or remove age-identifying information such as age, date of birth, or dates of school attendance or graduation. You will not be penalized for redacting or removing this information.