Original listing text, shown exactly as published by the company.
The role
You own the pipelines that bring the world's marketing data into Shadow — and keep them fast, accurate, and reliable as we scale to thousands of users. Every brand connects its full stack (ad platforms, ecommerce, analytics, email/SMS), and you make that data land cleanly, normalize into shared schemas, and stay in sync. The agent is only as good as the data underneath it; that layer is yours.
This is a hands-on, build-heavy engineering role for someone who has run large data systems before and wants to do it again in a smaller, faster environment.
What you'll own
- Build and scale the ingestion layer across third-party marketing APIs (Meta, Google, TikTok, GA4, Shopify, Klaviyo, and more) — auth, extraction, rate-limit handling, backfill, and incremental sync.
- Design normalization and transformation pipelines that map messy, platform-specific data into shared, queryable schemas (e.g. a unified creative/campaign/order model).
- Own data reliability at scale — sync accuracy, freshness, coverage, and observability. Build the systems that detect when a connection breaks or a number looks wrong before a user does.
- Engineer for multi-tenant scale and security: pipelines and storage that stay performant and cost-efficient across 1,000+ users and hundreds of connected brands — with strict data isolation, privacy, and compliance built in, not bolted on.
- Partner with the AI and data-science teams to expose clean, well-modeled data the agent can retrieve and reason over.
Must haves
- Experience building and operating large enterprise data pipelines engineered for scale — systems serving 1,000+ users (or equivalent data volume / tenancy), where reliability, isolation, and cost at scale were real constraints you solved.
- Strong SQL and Python, with production experience in a modern data warehouse (BigQuery, Snowflake, Redshift, or similar).
- Deep familiarity with ETL/ELT patterns, incremental sync, schema design, and data modeling for analytics.
- Built and maintained integrations against third-party APIs — OAuth flows, pagination, rate limits, schema drift, and the operational reality of connectors that break.
- A bias toward observability and data quality: you instrument your pipelines and you don't ship data you can't trust.
- Experience building or operating within SOC 2-compliant systems with enterprise-grade security and privacy — you've handled sensitive customer data under real compliance constraints (access controls, encryption, data isolation, auditability) and treat it as a first-class engineering requirement.
Nice to haves
- Experience in martech, adtech, or an adjacent data-heavy marketing domain — you've worked with ad platform or ecommerce data before and know where the bodies are buried (attribution windows, currency/timezone messes, deduping across platforms).
- Familiarity with our stack: GCP (BigQuery, Cloud Run), PostgreSQL + pgvector, and orchestration/transformation tooling (dbt, Airflow, Dagster, or similar).
- Experience with pipeline observability and tracing in an AI/LLM context (e.g. Langfuse).
- Comfort supporting data that feeds AI agents and retrieval systems, not just dashboards.
Culture fit
- Obsessive about data organization at scale. We're hiring for someone who lives in the data layer and wants to own it end to end.
- You’re a power AI user. You've embedded AI into every workflow you touch and you think in systems — not one-off prompts, but repeatable structures that compound.
- Entrepreneurial. You don't need much direction to move fast, you pivot when the situation demands it, and what you ship is production-grade, not a prototype you hand off for someone else to finish.