A hybrid Data & ML role at Dun & Bradstreet.
How Sydicom helps: we read this listing’s requirements and tune your CV and cover letter to the keywords its ATS (Lever) is scanning for, for candidates in India, then help you apply.
Original listing text, shown exactly as published by the company.
Design, build, and optimize scalable data pipelines and ETL/ELT workflows for large, complex datasets.
Design and implement foundational data architecture supporting identity resolution and ID graph systems.
Develop and enhance systems supporting identity resolution and ID graph construction (data ingestion, normalization, matching, and deduplication).
Process and unify multi-source datasets (cookies, device IDs, behavioral data, third-party and proprietary data).
Write efficient, testable, and maintainable code using Python and SQL for large-scale data processing.
Optimize data models, queries, and storage strategies for performance, scalability, and cost efficiency.
Build and maintain data validation, monitoring, and alerting systems to ensure data quality and reliability.
Troubleshoot, debug, and improve existing data pipelines and infrastructure.
Own and drive complex data problems end-to-end, from initial design through production deployment.
Make and influence key technical decisions related to data architecture, scalability, and system design.
Collaborate with data, platform, DevOps, and product teams to deliver scalable, production-ready solutions.
Document data pipelines, systems, and workflows clearly.
Continuously improve system performance, data quality, and pipeline resilience.
Contribute to building new capabilities that improve how customers understand and leverage data insights
8-12+ years of hands-on experience in data engineering or large-scale data processing.
Proven experience building and maintaining production-grade data pipelines and distributed systems.
Demonstrated experience architecting and delivering large-scale data platforms or mission-critical data systems.
Strong expertise in: SQL and relational databases (Postgres, BigQuery, Redshift, etc.), Python for data processing and analysis.
Experience with Google Cloud Platform (BigQuery, Dataflow, Pub/Sub, Cloud Storage, Cloud Functions) and/or AWS (S3, Redshift, EMR, RDS).
Experience working with large-scale datasets (hundreds of millions to billions of records).
Strong understanding of data modeling, partitioning, indexing, and query optimization.
Experience with distributed data processing and parallelization techniques.
Experience moving large volumes of data across systems and architectures.
Familiarity with CI/CD, containerization, and orchestration tools (Docker, Kubernetes, GitHub Actions, etc.).
Strong debugging and troubleshooting skills in complex data environments.
Experience with version control (Git) and Agile tools (Jira, Confluence, etc.).
Highly analytical with strong attention to detail and a data-driven mindset.
Ability to hit the ground running, quickly understand systems, and deliver independently.
Comfortable working in a remote, fast-paced, and collaborative environment.
Proven ability to drive system design and implementation.
Experience with identity graphs, entity resolution, or record linkage systems.
Background in AdTech, digital identity, cookies, or audience data platforms.
Experience with real-time or streaming data systems.
Familiarity with data quality, observability, and monitoring frameworks.
Experience with data visualization tools (Looker, Tableau, Power BI).
Knowledge of data privacy, compliance, and governance considerations.
Experience with modern data platforms such as Snowflake and Databricks.
Exposure to AI/ML technologies, including experience working with or integrating agentic frameworks.
Dun & Bradstreet
Data & ML
43 open roles on Sydicom
The Dun & Bradstreet Holdings, Inc. (D&B) is an American company that provides commercial data, analytics, and insights for businesses. Headquartered in Jacksonville, Florida, the company offers a wide range of products and services for risk and financial analysis, operations and supply, and sales and marketing professionals, as well as research and insights on global business issues. It serves customers in government and industries such as communications, technology, strategic financial services, and retail, telecommunications, and manufacturing markets. The company's database contains over 500 million business records worldwide.
Source: Wikipedia