Overview Senior Site Reliability Engineer (SRE) with Kubernetes and Rancher. Full-time role focused on building and maintaining highly resilient, secure systems, including in air-gapped environments. Responsibilities System Architecture & Management: Design, architect, and maintain highly reliable, multi-tenant systems using Kubernetes and related tools (RKE2). Includes components such as Ingress, Kong, Artifactory, and Sonar. Observability & Monitoring: Implement and manage observability solutions with Prometheus, Grafana, Splunk, and Elastic to ensure deep visibility into system health and performance, including in air-gapped settings. Compliance & Optimization: Ensure deployments meet stringent compliance standards and are optimized for performance and security. Code Quality & Security: Perform regular code quality analysis and security assessments using Sonar to identify and mitigate vulnerabilities. Incident Response: Collaborate with leads and specialized teams to resolve incidents quickly and improve resilience and recovery procedures. Documentation: Create and maintain documentation for system configurations, runbooks, and disaster recovery plans for managing systems in sensitive environments. Required Skills and Qualifications 8+ years of Site Reliability Experience. Experience with Kubernetes and Rancher. Technical Expertise: Proficiency with RKE2, Kubernetes, Ingress, Kong, Artifactory, Prometheus, Grafana, Splunk, Elastic, and Sonar. SRE & Observability: Strong background in Site Reliability Engineering and implementing comprehensive observability strategies. Secure Environments: Experience in air-gapped or zero-connectivity environments and protecting classified data. Troubleshooting: Ability to troubleshoot and optimize complex, multi-tenant infrastructures under pressure. Preferred Qualifications Relevant SRE or DevOps certifications (e.g., CKAD, CKA). Experience in government or defense-related SRE roles. Experience with Rancher and its ecosystem. Seniority level Mid-Senior level Employment type Full-time Job function Engineering and Information Technology Industries IT Services and IT Consulting #J-18808-Ljbffr
Senior Site Reliability Engineer
ORION INNOVATION
, bc, canada, , bc, canada
Published 27 days ago
Report job