Sydicom reads this form and drafts every answer from your CV. You review and submit it yourself. Free to start.

Sydicom insightsSydicom overview

A hybrid role at Kobie Marketing.

Level

Lead / Exec

Work

Hybrid

Focus

—

Pay

Est. $60k-$78k/yr

How Sydicom helps: we read this listing’s requirements and tune your CV and cover letter to the keywords its ATS (Lever) is scanning for, wherever you are, then help you apply.

Related roles

Original listing text, shown exactly as published by the company.

In this role, you will

Own and evolve the observability platform (e.g., New Relic) to provide end-to-end visibility across applications and infrastructure
Establish standards for monitoring, alerting, dashboards, and telemetry (logs, metrics, traces)
Leverage AIOps capabilities to improve anomaly detection, reduce noise, and accelerate root cause analysis
Drive automation and self-healing workflows to minimize manual intervention and improve system resilience
Collaborate across teams to ensure systems are observable by design and aligned with reliability goals
Continuously analyze system behavior and incident patterns to improve performance, scalability, and uptime

You will be part of a team focused on building a highly reliable, data-driven, and scalable operational ecosystem, where observability is a core foundation for engineering excellence.

How you will make an impact

Lead the observability strategy and execution, ensuring comprehensive visibility across all production and delivery environments.

Own and govern the enterprise observability platform (New Relic or equivalent tools such as Datadog or Dynatrace) and ensure consistent monitoring standards across systems.
Explore and adopt AI-driven monitoring capabilities (AIOps) to automate anomaly detection, reduce alert fatigue, and enable predictive problem management.
Collaborate closely with Production Support (L1/L2), DevOps, CloudOps, Software Engineering, and Database teams to triage complex production issues and accelerate incident resolution.
Act as the operational coordinator during service-impacting events, organizing workflows, managing cross-team dependencies, and providing structured updates to leadership.
Design and implement automated remediation workflows and self-healing mechanisms for recurring incidents.
Analyze telemetry data (logs, metrics, traces) to identify incident patterns and systemic anomalies, and continuously refine alert thresholds and routing logic.
Develop and maintain dynamic dashboards that reflect real-time system health, application performance, and infrastructure behavior.
Define and track reliability metrics such as SLOs, SLIs, MTTD, and MTTR to improve service reliability.
Ensure clear, timely communication with stakeholders during incidents and operational events.
Drive organization-wide adoption of observability best practices through documentation, training, and knowledge sharing.

What you need to be successful

8–10+ years of experience in observability, site reliability engineering (SRE), DevOps, or advanced production operations in large-scale enterprise environments.

Expert-level hands-on experience implementing and optimizing observability platforms such as New Relic, Datadog, Dynatrace, or Splunk.
Strong understanding of monitoring fundamentals including logs, metrics, traces, and alerting strategies.
Experience working with cloud-native architectures (AWS preferred).
Familiarity with containerized environments and orchestration platforms such as Kubernetes.
Experience integrating observability practices into CI/CD pipelines to ensure applications are observable by design.
Strong understanding of incident management, problem management, and change management practices (ITIL concepts).
Demonstrated ability to analyze telemetry data to identify patterns, detect anomalies, and improve operational reliability.
Strong leadership and collaboration skills with the ability to coordinate across engineering, DevOps, and operations teams.
Excellent communication skills and a strong focus on operational excellence and continuous improvement.

Nice to Have

Experience implementing AI/ML capabilities within observability tools for anomaly detection and predictive monitoring.
Familiarity with AIOps platforms and automated remediation workflows.
Experience with event streaming platforms such as Kafka for telemetry ingestion or real-time data processing.
Basic understanding of application architecture and troubleshooting distributed systems.
Experience with automation frameworks or serverless workflows (e.g., AWS Lambda, scripting, or infrastructure automation).

About Kobie Marketing

Kobie Marketing

Other

27 open roles on Sydicom

Kobie Marketing is a leading customer loyalty marketing company. They specialize in designing, building, and managing loyalty programs for major brands. The company leverages strategy, technology, and analytics to help clients foster stronger customer relationships and drive engagement.

Generated by Sydicom AI

Lead Observability Engineer

In this role, you will

Nice to Have

About Kobie Marketing

About Kobie Marketing