Role: SRE Lead Duration: 12 months Location: Toronto Hybrid: 2 days in office a week Overview Deep application and system-level knowledge across complex end-to-end environments, including tightly integrated on‑premise and cloud native services, supporting large-scale, multi‑tier transaction flows. Prior hands‑on experience with APM and observability platforms, including Dynatrace or comparable enterprise observability tools, with the ability to instrument, analyze, and troubleshoot complex distributed applications. Proven deep troubleshooting experience resolving issues across multilayer, end‑to‑end (E2E) environments, spanning application, infrastructure, network, and platform layers across on‑prem and cloud services. The role will drive and execute the SREWCCS Roadmap for the client, with hands‑on responsibilities from day one. Responsibilities Assess current capability, identify gaps, and contribute to the SRE WCCS roadmap. Navigate multi‑team SRE and IT Ops to drive results. Provide creative workarounds and solutions. Lead on observability with Dynatrace, including DQL, Gen3 dashboards, traces on Grail, Active‑Gate plugins, SRG workflow development, and business events. Leverage observability signals (metrics, events, logs, and traces) to identify root causes and resolve failures across multilayer environments. Apply observability fundamentals (MELT) and design intuitive dashboards with UI/UX expertise. Utilize AWS observability tools such as CloudWatch, ApplicationSignals, Lambda, and API‑GW. Develop with Python, AWS Lambda, ECS, and Azure Functions; design and maintain AI‑based system monitoring. Implement OTEL where applicable. Ship platform capabilities, including self‑service onboarding pipelines, policy‑as‑code, golden signals‑as‑code, and standardized instrumentation libraries. Build backend integration components in Python and Node.js. Qualifications Experience in large‑scale, multi‑tier transaction environments, especially in financial services. Deep knowledge and experience implementing SRE practices across complex systems. Proficiency in APM observability platforms, Dynatrace, and comparable tools. Strong troubleshooting skills across application, infrastructure, network, and platform layers. Background in platform engineering and shipping platform capabilities. Proficiency in Python and Node.js programming. Familiarity with SRE concepts as outlined in Google SRE book and workbook. Experience with AWS observability services and automated monitoring. Benefits Not provided in the original description; omitted. #J-18808-Ljbffr