This high-growth technology company is building AI-driven software for the insurance industry, focused on improving how agencies and brokerages operate and serve customers. Their platform sits at the intersection of automation, data, and workflow optimization, supporting a large and evolving market across North America. The business is scaling quickly, backed by institutional investors, and investing heavily in its core platform, infrastructure, and AI capabilities. The Role This is a senior infrastructure role focused on building and scaling the systems that underpin a modern AI platform. You’ll take ownership of core infrastructure, reliability, and developer experience, working closely with engineering peers to ensure systems are performant, secure, and built to handle rapid growth. What You Will Do Design and operate the orchestration layer that manages high-volume AI workloads, including scheduling, scaling, and lifecycle management Build and evolve observability across the stack using tools like Prometheus, Grafana, and OpenTelemetry to improve system visibility and incident response Establish and mature security practices, including access controls, secrets management, and compliance‑aligned processes Partner with ML teams to support model deployment, training workflows, and production serving infrastructure Develop and scale data infrastructure, including ingestion pipelines and storage layers, to support high‑throughput event processing What You Bring 5+ years in infrastructure, DevOps, platform engineering, or SRE within cloud‑native environments Strong experience with AWS, infrastructure‑as‑code tools (e.g. Terraform), and container orchestration (Kubernetes, ECS, or EKS) Hands‑on experience with observability tooling (Grafana stack, OpenTelemetry, or similar) and building production monitoring systems Familiarity with CI/CD systems such as GitHub Actions and Argo CD, with a focus on improving deployment workflows Exposure to security and compliance concepts (IAM, secrets management, SOC 2) and working knowledge of MLOps or data platforms Why This Role You’ll be working on the foundational platform of a fast‑scaling AI product, with direct ownership over infrastructure that supports real-world, high-volume workloads. The scope spans infrastructure, data, and machine learning systems, offering a rare opportunity to shape how a modern AI platform is built and operated from the ground up. #J-18808-Ljbffr
Senior Infrastructure Engineer
TEKREK
vancouver, vancouver
Published 17 days ago
Report job