Are you an experienced Site Reliability Engineer looking for a new challenge? We’re looking for aStaff Site Reliability Engineerto join us at Thinkific.We’re looking for aStaff Site Reliability Engineer (SRE)to join us at Thinkific. As a Staff Site Reliability Engineer, you will help us scale and secure the infrastructure that powers thousands of online course creators around the world.In this role, you’ll play a critical role in improving the performance, reliability, and security of our platform. You’ll work cross-functionally with engineers, product managers, and stakeholders to drive forward reliability‑focused initiatives, build scalable systems, and mentor others. You’ll also help shape our technical strategy, lead major infrastructure projects, and act as a domain expert in modern cloud‑native practices, with a specific emphasis on Kubernetes, cloud infrastructure (AWS), observability, and service reliability.Your goal will be to help guide and execute on projects related to your technical domain. Here’s how you’ll accomplish this:Own one or more technical domains across our infrastructure with accountability for system reliability, performance, scalability, and securityLead projects to evolve our Kubernetes‑based platform, ensuring alignment with SLOs, security best practices, and long‑term maintainabilityContribute to the design and evolution of our infrastructure using Terraform, Helm, and cloud‑native tools, with an emphasis on modularity, reuse, and automationPartner with engineering teams to design robust deployment pipelines, ensure operational readiness, and build secure‑by‑default patterns for new servicesLead incident response efforts and participate in on‑call rotation, driving a culture of blameless postmortems and learningWrite infrastructure and application code in Ruby, Node.js, Python, or Bash to automate operations and improve developer experienceServe as a mentor and multiplier, raising the technical bar through coaching, knowledge sharing, and technical leadershipActively promote observability, testing, and continuous improvement in everything you build and advocate for within your teamParticipate in our on‑call rotation and incident response processes to help maintain a high level of service reliabilityThe person we have in mind likely:Has 6+ years of experience in software or infrastructure engineering, including 4+ years working with Kubernetes in production environmentsHolds a CKA certification or equivalent hands‑on Kubernetes expertise (bonus for experience managing multi‑tenant clusters or complex networking in K8s)Has deep knowledge of TLS, certificates, ciphers, and encryption protocols, and can explain how they secure communications in a distributed systemHas production experience with AWS infrastructure and services (EKS, RDS, IAM, ALB, S3, etc.)Writes infrastructure‑as‑code using Terraform, and has built scalable and secure infrastructure following modular and reusable patternsIs comfortable with monitoring and observability tooling (e.g., New Relic, Datadog, Prometheus, Grafana, Sentry) and building alerting based on meaningful SLOsHas experience supporting distributed systems with relational and non‑relational databases (PostgreSQL, AWS Aurora), message queues (Sidekiq, SNS/SQS), and asynchronous architecturesEnjoys collaborating across teams and helping shape engineering roadmaps and architectural directionBrings a strong ownership mentality, cares deeply about developer experience and operational excellence, and thrives in a fast‑paced environmentLoves to learn and grow. They’ve found (and keep looking for) ways to level up their skills in this field, whether that’s through formal education, gaining professional experience, or maybe even building their own businessThese things would also be nice, but we think you could learn them on the job:Experience with Database Administration (DBA) practices, including performance tuning, replication strategies, backup and recovery planning, and operational support for PostgreSQL or AWS Aurora environmentsExperience working with Ruby on Rails and/or Node.js applications in productionFamiliarity with Cloudflare, load balancing strategies, and CDN configurationExperience improving CI/CD pipelines and secure software supply chainsWe’re committed to fair and transparent pay that reflects both where you’re at and where you can grow. This role has a salary range of $132,900 – $166,100 – $182,900 in Canada, designed to capture the full journey from developing skills to excelling in the position. Most new hires start between the minimum and midpoint, which aligns with being fully capable in the role. Salaries above the midpoint are typically reserved for team members who have demonstrated strong, consistent performance, deep expertise, and a significant positive impact within the role.For high‑demand or hard‑to‑fill positions like this one, we may hire above midpoint for candidates who bring exceptional experience, skills, or impact potential.Diversity, Equity, Inclusion and Belonging & Accessibility This is just our initial idea of who we’re looking for! At Thinkific, we know that people have unique career journeys. If your experience is close to what we’ve described but you feel that you might be missing a few of the requirements, please still apply! We believe in equal opportunity and are committed to diversity, equity, inclusion, and belonging across every facet of our business.We’re also committed to providing a comfortable and accessible interview experience for every candidate. If there are any accommodations our team can make throughout our hiring process (big or small), please let us know.#J-18808-Ljbffr
Staff Site Reliability Engineer
THINKIFIC
winnipeg, winnipeg
Published 28 days ago
Report job