Hiring All Shifts Seeking an experienced Data Center Operations Engineer to ensure environments run with precision, efficiency, and uptime across global sites. This role will bridge IT and facilities, maintaining the power, cooling, and compute systems that sustain the company’s world‑class AI platforms. This role requires a technically strong, detail‑oriented engineer who thrives in high‑availability environments. Must understand the full stack of data center infrastructure: Compute, Network, Power, and Cooling, and must take pride in systems that run flawlessly because of your work. Be able to communicate clearly, perform methodically under pressure, and collaborate effectively across IT, facilities, and vendor teams. This role requires a builder, a problem‑solver, and a guardian of uptime, someone who values precision, safety, and accountability in every aspect of operations.ResponsibilitiesOwn the day‑to‑day reliability and performance of company data centers, supporting both IT and facility infrastructure. This includes installing and configuring servers and compute equipment, managing structured cabling, and performing Layer 1–3 troubleshooting across compute and network layers.Partner closely with colocation and data center providers to maintain uptime reviewing maintenance procedures, coordinating planned work, validating redundancy during transitions, and verifying site health after power or cooling events.Work alongside facilities teams, you’ll help operate and maintain critical power and cooling systems, including transformers, PDUs, UPS, switchgear, generators, CRAC and CRAH units, CDUs, chillers, cooling towers, and containment systems. You’ll assist in capacity planning, preventive maintenance, and load balancing across power and cooling zones to maintain safe, efficient, and redundant operations.Lead incident response and root‑cause analysis, refine standard operating procedures, and implement automation to improve efficiency and consistency across company data centers worldwide.Required Skills10+ years in data center compute operations, facilities, or infrastructure engineering and/or a degree in an Engineering or Computer Science disciplineHands‑on experience with servers, networking, and structured cablingWorking knowledge of electrical systems including transformers, PDUs, UPS, switchgear, and generatorsUnderstanding of cooling systems including CRAC/CRAH units, CDUs, cooling towers, chillers, and containment environmentsFamiliarity with Linux and basic scripting (Bash, Python, Ansible)Proficiency with network CLIs (Cisco, Arista, Juniper)Experience collaborating with colocation providers and reviewing MOP/EOPs for electrical and mechanical workProficiency with ITSM/DCIM platforms (e.g., Jira, ServiceNow, NetBox, Sunbird)Ability to manage server, switch, router, storage, and hardware lifecycle processesAbility to update asset management systems using scanners and inventory toolsStrong documentation, troubleshooting, and communication skills for ticketing, customer communication, and team coordinationStrong multitasking, adaptability, and time‑management skills with a focus on quality and throughputMust be punctual, reliable, and well‑organizedMust have strong interpersonal and teamwork skills, with the ability to work independently when neededWillingness to support on‑call rotation and meet a 60‑minute on‑site SLAAbility to safely lift 50–75 lbs and remain on feet for majority of the workdayAbility to operate material‑handling equipment (pallet jacks, forklifts, server‑lift)Demonstrated ability to learn new systems, methodologies, software, and hardware platformsExperience working in high‑tempo, high‑stress environmentsExperience leading and/or mentoring more junior staffersDomain expert in one or more of the following functional areas:Datacenter power systemsDatacenter cooling/HVAC systemsServer or liquid coolingNetwork routing and switchingLate generation flash storage arraysFacility and/or network securityNetwork infrastructure monitoringAbility to project manage key datacenter‑centric initiativesAble to effectively present data to senior leadershipFamiliar with datacenter key performance indicators (KPI)Ability to manage outage eventsBe a good person and good team matePreferred CertificationsCompTIA Server+, Network+, or Linux+ITIL Foundation certificationNetworking: CCNA, JNCIA, ACE-A#J-18808-Ljbffr
Data Center Operations Engineer
COVESTIC INC
saint jérôme, saint jérôme
Published 18 days ago
Report job