Original listing text, shown exactly as published by the company.
What You'll Be Doing
Hands-on Data & AI Solutions for Operations Support
- Lead and contribute to high-impact data and AI initiatives that improve operations support outcomes, including real-time incident enrichment, automated root‑cause analysis, predictive alerting, ticket clustering and auto-triage, change risk scoring, knowledge mining, and intelligent runbooks.
- Design and deliver scalable AI-enabled features embedded into operations support platforms such as ServiceNow, Jira Service Management, monitoring/observability tools, and ITSM systems.
- Ensure all solutions meet strict operational SLAs for reliability, low latency, auditability, explainability, and zero-downtime deployment.
- Stay up to date with emerging AIOps tools, research, and trends, and apply them to enhance operations support.
AIOps Tools & Platform Leadership
- Lead the architecture, development, and continuous improvement of internal AIOps platforms and reusable components supporting operations teams.
- Integrate AIOps tools with ITSM systems, observability platforms (Prometheus, Grafana, ELK, Dynatrace, Splunk), ticketing systems, and automation frameworks.
- Apply best practices in MLOps/AI Ops tailored to production environments: model monitoring, drift detection, automated rollback, performance checks, and cost optimization at scale.
AI Technical Leadership for Operations Support Initiatives
- Serve as the principal AI technical authority for operations support transformation programs across service operations, NOC, support desks, infrastructure operations, and reliability engineering.
- Lead technical discussions, architecture reviews, proof of concepts, vendor evaluations, and solution selection involving AI for operations.
- Identify, prioritize, and drive high‑value AI use cases focused on reducing MTTR/MTTD, automating L1 triage, predicting major incidents, generating post‑mortems, optimizing shift handovers, and enabling proactive operations.
Team & People Leadership
- Build, mentor, and lead a high-performing squad of AIOps specialists focused on measurable operations support improvements.
- Foster a culture of experimentation, production‑first thinking, and commitment to operational impact—reduced toil, faster resolution, and higher availability.
- Provide technical coaching, conduct design/code reviews, and guide career development with emphasis on operations and support domain expertise.
Stakeholder & Cross-Functional Collaboration
- Work closely with operations support leaders, incident managers, service owners, reliability engineers, ITSM teams, infrastructure groups, and other stakeholders to align AI solutions with operational needs.
- Collaborate deeply with DS&AI Competency teams to ensure high-quality, scalable, and sustainable AI delivery.
What We’re Looking For
- Strong background indata engineering, AI/ML, or operations support technology, including technical leadership in operations, IT, or service environments.
- Proven track record delivering production AI/ML/data solutions that improve MTTR, MTTD, availability, and ticket deflection.
- Hands-on expertise with Python, Spark, Kafka, Airflow, cloud data platforms, PyTorch/TensorFlow, LLMs, and integrations with tools like ServiceNow, PagerDuty, Splunk, Datadog, Moogsoft, Big Panda, Databricks, and Azure/ADF.
- Deep knowledge of AIOps practices including event correlation, anomaly detection, predictive analytics, automated actions, and GenAI for operations.
- Experience designing, building, or enhancing AIOps and internal tooling platforms.
- Familiarity with ITIL processes (incident, problem, change, service request, knowledge management).
- Experience with GenAI/LLM applications for operations such as copilots, auto-remediation, knowledge search, and alert/incident summarization.
- Proven ability to scale AIOps in large operations or NOC environments while balancing hands-on work with strategy.
- Strong communication skills, able to translate complex AI concepts for operations teams and executives, focusing on action and automation to reduce operational toil.