TechInsights seeks a Senior Site Reliability Engineer to enhance AI operations from anywhere in Canada. Oversee reliability strategies, manage error budgets, and collaborate closely with engineering teams.You’ll be instrumental in shaping the technical architecture and reliability practices at TechInsights. Your role focuses on end-to-end reliability initiatives, including defining service-level objectives and leading incident management. Through collaboration and mentorship, you will elevate technical standards and advance team capabilities.Key Responsibilities:• Develop SLOs and manage production service reliability metrics• Architect solutions for AI agent failure containment• Mentor junior engineers and enhance team capabilities• Drive continuous improvement in operational processes• Utilize Datadog for service health monitoring and automationRequirements:• 6-8 years in site reliability engineering• Bachelor's degree in Computer Science or applicable field• Proficiency with AWS services and multiregion patterns• Strong skills in Terraform and operational tooling• Experienced in managing CI/CD pipelinesTransform site reliability for AI operations at TechInsights and drive impactful changes.#J-18808-Ljbffr
Experienced Site Reliability Engineer - Remote
TECH INSIGHTS
ottawa, ottawa
Published 28 days ago
Report job