Job Overview This role will be responsible for leading the design, development, implementation and support of Site Reliability Engineering (SRE) solutions for applications supported by the Cloud organization. Responsibilities Lead code and non-functional reviews of all production-bound SRE solutions Drive transformation by automating existing processes and conducting engineering mindset meetups Manage SRE application assets such as cloud instances and source code repositories and publish technical designs Publish and review implementation plans for SRE solutions bound to production, explore new capabilities and technologies, and document how-to guides Track, audit, monitor and implement on technical work streams, acting as a portfolio SME and documenting common components and infrastructure Act as the escalation point in the on-call rotation, supporting maintenance, scheduled work, release deployments, incident and problem management Own RCA action items, focus on continuous improvement and technical standards, and drive productivity gains in monitoring, tooling and best practices Maintain technology currency through patching, certificate renewal and compliance, with a focus on automation Ensure application availability and uptime per SLAs, manage PagerDuty rules and thresholds, Moogsoft situation management, Dynatrace tuning, and coach the SRE team Assist developers in delivering reliable, high-performing code Help build a high-performing diverse team that leverages individual strengths Qualifications Advanced knowledge of industry practices, with a focus on SRE Advanced experience in diverse environments (cloud, distributed, mainframe, business workflows, APIs, databases) Excellent communication skills with a direct style Effective negotiation and stakeholder management skills, with the ability to influence at the director level Hands-on experience with SRE tools and languages (Ansible, Dynatrace Managed, Moogsoft, PagerDuty, ServiceNow, GitHub, Slack, Elastic, Logstash, Kibana, Grafana, Catch Point, RedHat OCP) Nice-to-have Computer Engineering, Computer Science, or related technical degree or experience Exposure to Azure, AWS, Docker, OCP, GitHub Experience working in agile environments Exposure to Java, Go, Terraform, Spring, Temporal Benefits A comprehensive Total Rewards Program including bonuses and flexible benefits, competitive compensation, and stock where applicable Leaders who support your development through coaching and managed opportunities Opportunity to make a lasting impact in technology transformation Work in a dynamic, collaborative, high-performing team Flexible work/life balance options Opportunities to take on progressively greater accountabilities #J-18808-Ljbffr

Staff Site Reliability Engineer

RBC

Similar jobs

Fusionneur

TCI+

Mécanicien D'équipement Lourd

TRANSPORT GINO BOIS (GROUPE TGB)

Contremaître(Sse) En Signalisation

TCI+

Assistant Gérant En Serrurerie

GROUPE PRO ACCÈS

Signaleur(Euse)

TCI+

Adjoint Responsable D'affaires

TCI+

Commis D'entrepôt

TCI+

Receive similar jobs by email