The SRE will focus on ensuring reliable, resilient systems through task automation, observability, incident response, and problem elimination, while also participating in production-side operations and on-call rotations. KEY RESPONSIBILITIES Deliver improvements to maximize system availability and performance through optimized and automated operational tasks. Collaborate on the development of operational tools, problem management, and architecture reviews. Troubleshoot ServiceNow issues and occasional on-premise capabilities in a Linux environment. Explore and implement observability practices including metrics, logging, tracing, and alerting to measure product reliability. Participate in on-call rotation with global team members, ensuring responsiveness during agreed hours. Contribute to documentation of ServiceNow instances and related dependencies. Identify and prioritize technical debt impacting client satisfaction or operational efficiency. Provide feedback on policies and procedures to enhance SRE and operational practices, improving safety and efficiency. REQUIRED QUALIFICATIONS 7+ years of experience in software development, infrastructure, or system administration. Proficiency in at least one programming language (e.g., Python) or ServiceNow administration/development experience. Strong oral and written communication skills. Proven ability to establish effective relationships with colleagues and collaborate on successful delivery. Dependable team player with demonstrated commitment to client service. Ability to respond appropriately during technical emergencies such as outages. Willingness to participate in on-call rotation. DESIRED QUALIFICATIONS ServiceNow administration or development experience (can be acquired on the job with training). Experience with SQL databases, APIs, and web infrastructure. Familiarity with chatbot technology and on-call escalation incident management. Strong interest in reliability, resilience principles, and SRE practices. WORKING CONDITIONS Global team collaboration across multiple time zones. Production-side operational responsibilities with occasional on-call duties. Fast-paced environment requiring adaptability, problem-solving, and continuous improvement mindset. #J-18808-Ljbffr
Engineer - Site Reliability
COMPUNNEL, INC.
montreal (administrative region), montreal (administrative region)
Published 27 days ago
Report job