Original listing text, shown exactly as published by the company.
What You’ll Do
BMS / Controls Architecture & Integration
- Architect and manage BMS integration across colocation and Lambda-owned facilities, covering chillers, CRAHs, CDUs (Coolant Distribution Units), cooling towers, UPS systems, PDUs, and automatic transfer switches.
- Define standards for BMS point lists, naming conventions, control sequences, and integration protocols (BACnet, Modbus, SNMP, OPC-UA, RESTful APIs).
- Oversee commissioning and acceptance testing of new BMS deployments and CDU/TCS loop integrations for next-generation liquid-cooled GPU rack systems.
- Collaborate with colocation partners (Equinix, Digital Realty, and others) to ensure telemetry data flows from provider BMS/EPMS into Lambda's monitoring stack.
DCIM & Telemetry Platform Management
- Own the DCIM platform strategy and roadmap — evaluating, selecting, and implementing tooling for asset management, capacity planning, environmental monitoring, and power chain visibility.
- Develop and maintain real-time dashboards for PUE, thermal performance, stranded capacity, and cooling system efficiency across all Lambda sites.
- Build and maintain telemetry pipelines ingesting data from BMS, PDUs, in-rack sensors, CDUs, and network devices into centralized monitoring and alerting platforms (e.g., Prometheus, Grafana, InfluxDB, or equivalent).
- Define alarm thresholds and escalation workflows for critical facility events including high coolant temperatures, CDU inlet/outlet anomalies, leak detection, and power exceedances.
Liquid Cooling Controls & High-Density Operations
- Develop control strategies and setpoint frameworks for TCS (Thermal Control System) loops supporting direct liquid cooling at densities of 220–380 kW per rack.
- Evaluate and qualify CDU vendors on controls integration capabilities, telemetry exposure, and remote management interfaces.
- Define and enforce operational procedures for CDU commissioning, setpoint changes, loop pressure management, and fluid quality monitoring.
- Support design and construction coordination for liquid cooling infrastructure in new data center buildouts, ensuring BMS and controls readiness at Day 1.
Operational Reliability & Incident Response
- Establish and maintain facility event management processes, including on-call response protocols for facility telemetry anomalies.
- Lead root cause analysis for facility system failures and implement corrective actions to prevent recurrence.
- Partner with the data center operations team to maintain and refine emergency response runbooks tied to BMS alerts and automated controls.
- Drive continuous improvement in MTTR for facility-related events through better telemetry coverage and automated remediation.
Vendor & Stakeholder Management
- Manage BMS integrators, DCIM vendors, and control subcontractors - from RFP through design, installation, commissioning, and ongoing support.
- Serve as the primary technical interface with colocation providers on all BMS/EPMS integration topics.
- Collaborate with Lambda's infrastructure engineering, construction, and procurement teams to align controls requirements with facility buildout timelines.
- Support due diligence and technical evaluation for new colocation sites and modular data center deployments from a telemetry and controls readiness perspective
You
Required Experience
- 7+ years of experience in data center infrastructure engineering, with at least 4 years focused on BMS, DCIM, or controls systems in a hyperscale, colocation, or AI/HPC environment.
- Hands-on experience designing and integrating BMS for mission-critical facilities including UPS, PDU, CRAH/CRAC, chiller plant, cooling tower, and liquid cooling (CDU/in-row) systems.
- Strong working knowledge of industrial control protocols: BACnet IP/MS-TP, Modbus TCP/RTU, SNMP, DNP3, and modern API-based integrations.
- Demonstrated experience with DCIM platforms (Nlyte, Sunbird, Vertiv TRELLIS, or equivalent) including deployment, configuration, and ongoing administration.
- Experience with real-time telemetry stacks (Prometheus, InfluxDB, Grafana, or similar) applied to infrastructure monitoring use cases.
- Strong understanding of data center power and cooling systems, including PUE optimization, thermal management, and redundancy architectures (2N, N+1).
Preferred Qualifications
- Direct experience with direct liquid cooling (DLC) systems, CDU controls integration, and TCS loop management for high-density AI GPU deployments (100+ kW per rack).
- Familiarity with OCP (Open Compute Project) hardware and telemetry standards.
- Experience working with major colocation providers (Equinix, Digital Realty, CyrusOne, etc.) on BMS/EPMS integration and data sharing agreements.
- Exposure to modular or edge data center deployments and associated controls considerations.
- Background in scripting and automation (Python, Ansible, Terraform) applied to infrastructure management workflows.
- Experience operating data centers at international scale, including Asia-Pacific or Southeast Asian markets.
- Relevant certifications: CDCP, CDCE, ETA Data Center Specialist, or vendor-specific BMS/controls certifications.