Client is seeking an experienced Site Reliability Engineer (SRE) to support and enhance the reliability, performance, and operational efficiency of our global ServiceNow SaaS platform. As part of the Application Infrastructure (AI) team, you will be instrumental in advancing SRE practices, ensuring seamless integration and stability across on-premise infrastructure and cloud systems. This role combines software development, automation, systems engineering, and operations in a highly collaborative environment. This is a hybrid role with both development-focused and production operational responsibilities, including periodic on-call participation. Key Responsibilities Drive automation and reliability improvements to reduce operational overhead and increase system availability Troubleshoot ServiceNow issues and occasionally resolve Linux-based infrastructure problems Develop and maintain observability tools including metrics, logging, tracing, and alerting to track and enhance system health and performance Collaborate with global SRE peers to deliver reliable and resilient ServiceNow capabilities Identify, document, and prioritize technical debt and propose long-term solutions to reduce recurring issues Contribute to the design and documentation of the ServiceNow ecosystem, including integrations with SQL databases, APIs, and web platforms Participate in on-call rotation and respond effectively to technical incidents or outages Provide input to policies and procedures with the goal of improving security, efficiency, and operational consistency Champion a culture of continuous improvement, resilience, and operational excellence Required Qualifications Minimum 7+ years of professional experience in software development, system administration, or site reliability engineering Experience in at least one of the following areas: ServiceNow administration or development Strong troubleshooting skills and a proactive approach to problem-solving Familiarity with Linux systems, shell scripting, and general infrastructure support Effective verbal and written communication skills Demonstrated ability to collaborate and build strong working relationships in a team environment Willingness to work in an on-call rotation and respond to critical incidents when needed Preferred Qualificatio nsDirect experience with ServiceNow (administration or development) Exposure to observability tools (e.g., Prometheus, Grafana, ELK, Splunk) Familiarity with DevOps/SRE best practices and tools Experience with infrastructure automation (e.g., Ansible, Terraform) Knowledge of incident management, capacity planning, and monitoring frameworks Certifications (if any) ServiceNow certifications (Administrator, Developer) are a plus but not required Relevant certifications in Linux, DevOps, or SRE disciplines are desirable #J-18808-Ljbffr
Site Reliability Engineer
COMPUNNEL, INC.
montreal, montreal
Published 27 days ago
Report job