Sydicom reads this form and drafts every answer from your CV. You review and submit it yourself. Free to start.
A remote Data & ML role at Protege. PhD or equivalent Master’s Degree + 4+ years industry experience in machine learning, computer science, statistics, engineering, mathematics, economics, or…
Keywords this role’s ATS scans for
Sydicom tailors your CV and cover letter to match these.
How Sydicom helps: we read this listing’s requirements and tune your CV and cover letter to the keywords its ATS (Ashby) is scanning for, wherever you are, then help you apply.
Original listing text, shown exactly as published by the company.
Design and build datasets, tasks, and environmentsDesign and build datasets, tasks, environments, and evaluation assets for benchmarking agentic systems and multi-step model behavior.
Translate real-world workflows into structured tasks, interaction traces, trajectories, stateful environments, and verifiable outcomes that can be used to evaluate advanced AI systems.
Develop frameworks for evaluating real-world data qualityDevelop frameworks that assess diversity, realism, coverage, fidelity, informativeness, and downstream usefulness of datasets for agentic systems.
Build quality scorecards and evaluation methods that make dataset strengths, weaknesses, and failure modes legible across teams.
Benchmark model behavior in RL and agentic settingsEvaluate planning, tool use, robustness, recovery from failure, task completion, and generalization behavior in RL-style or agentic environments.
Connect model failures back to concrete dataset, environment, or task-design gaps and recommend improvements grounded in empirical evidence.
Build scalable evaluation and validation toolingContribute to tools and systems that automate dataset validation, environment generation, rollout analysis, benchmark construction, and evaluation workflows.
Improve internal infrastructure for reproducible experimentation, benchmark management, and evaluation quality.
Partner across research, engineering, and productCollaborate closely with research and engineering teams to identify data bottlenecks, improve evaluation methodology, and shape internal best practices around task-grounded AI training data.
Represent DataLab’s perspective in cross-functional discussions around dataset quality, benchmark design, and frontier agentic-system evaluation.
Near-term: establish a strong evaluation baselineCreate clear benchmark frameworks, evaluation assets, and dataset-quality scorecards that help Protege reason about how real-world data impacts advanced agentic systems.
Use rigorous evaluation methods to identify meaningful dataset improvements, improve benchmark fidelity, and sharpen the company’s understanding of what high-impact agentic data actually looks like in practice.
Protege's ValuesPass the Loved Ones' Test
We act with integrity and do the right thing - especially when it's hard and no one is watching.
Always Find a Way
We are resourceful, resilient builders who solve hard problems and push through obstacles.
Go Fast and Grow Fast
Velocity matters. We move with urgency, learn quickly, and continuously improve as individuals and as a company.
Practice Kindness and Candor
We communicate directly and respectfully, building trust through honest feedback and genuine care for one another.
Deliver Together
We win as one team. Collaboration, accountability, and shared ownership drive our success.
Own the Outcome. Hone the Craft.
We take pride in our work, sweat the details, and continuously raise the bar for excellence.
Protege
Data & ML
14 open roles on Sydicom