Original listing text, shown exactly as published by the company.
What you will do
- Lead evaluation and validation of AI and GenAI systems, including assessment of hallucination, fairness, robustness, explainability, and other model behavior risks.
- Design and implement repeatable evaluation workflows, benchmark datasets, and structured model behavior tests.
- Serve as the Responsible AI data science partner for assigned AI delivery teams, guiding the integration of RAI metrics, guardrails, and evaluation practices.
- Translate research findings and experimental methods into scalable, platform-ready evaluation capabilities.
- Collaborate with AI/ML engineers and QA analysts to align evaluation logic with testing pipelines and deployment workflows.
- Support the creation of model cards, evaluation documentation, and transparency artifacts required for governance and review.
- Provide Responsible AI consultation to product, engineering, risk, legal, and governance stakeholders.
- Mentor junior and mid-level data scientists on evaluation design, experimental rigor, and Responsible AI practices.What you will bring:
- Bachelor’s degree in Computer Science, Statistics, Mathematics, or a related field, or equivalent experience.
- 6+ years of professional data science experience.
- Demonstrated experience developing or evaluating AI/ML systems in production or pre-production environments.
- Strong grounding in experimental design, statistical analysis, and model evaluation methodology.
- Experience collaborating with engineering and product teams in an agile, iterative delivery environment.
- Proficiency in Python and SQL for data analysis and evaluation workflows.
Experience working with modern cloud platforms such as AWS.
- Strong communication skills and ability to operate effectively in a matrixed team model.Technology Environment
- Cloud platforms such as AWS, including managed data and ML services.
- Data storage and retrieval technologies such as relational and NoSQL databases.
- Data analysis and visualization tools such as Python libraries, QuickSight, Tableau, or Power BI.
- Collaboration and delivery tools supporting agile development and cross-functional workflows.Security and Risk Awareness
- Contribute to Responsible AI and security-by-design practices by ensuring evaluation methods consider safety, misuse, and model risk.
- Partner with engineering, security, and governance teams to identify, document, and mitigate AI-related risks surfaced through evaluation activities.
- Support evolving security and risk requirements as they relate to AI system behavior, testing, and governance reviews.What Will Set You Apart
- Experience evaluating Generative AI or LLM-based systems.
- Hands-on work with explainability, fairness analysis, hallucination detection, or robustness testing techniques.
- Familiarity with Responsible AI principles, governance frameworks, or model documentation practices.
- Experience translating research or experimental methods into repeatable, production-aligned workflows.
This job description is not intended to be an exhaustive list of all duties, responsibilities and qualifications of the job. The employer has the right to revise this job description at any time. You will be evaluated in part based on your performance of the responsibilities and/or tasks listed in this job description. You may be required perform other duties that are not included on this job description. The job description is not a contract for employment, and either you or the employer may terminate employment at any time, for any reason.
#PJT