Overview Senior Site Reliability Engineer (SRE) with Kubernetes and Rancher. Full-time role focused on building and maintaining highly resilient, secure systems, including in air-gapped environments.ResponsibilitiesSystem Architecture & Management:Design, architect, and maintain highly reliable, multi-tenant systems using Kubernetes and related tools (RKE2). Includes components such as Ingress, Kong, Artifactory, and Sonar.Observability & Monitoring:Implement and manage observability solutions with Prometheus, Grafana, Splunk, and Elastic to ensure deep visibility into system health and performance, including in air-gapped settings.Compliance & Optimization:Ensure deployments meet stringent compliance standards and are optimized for performance and security.Code Quality & Security:Perform regular code quality analysis and security assessments using Sonar to identify and mitigate vulnerabilities.Incident Response:Collaborate with leads and specialized teams to resolve incidents quickly and improve resilience and recovery procedures.Documentation:Create and maintain documentation for system configurations, runbooks, and disaster recovery plans for managing systems in sensitive environments.Required Skills and Qualifications8+ years of Site Reliability Experience.Experience with Kubernetes and Rancher.Technical Expertise:Proficiency with RKE2, Kubernetes, Ingress, Kong, Artifactory, Prometheus, Grafana, Splunk, Elastic, and Sonar.SRE & Observability:Strong background in Site Reliability Engineering and implementing comprehensive observability strategies.Secure Environments:Experience in air-gapped or zero-connectivity environments and protecting classified data.Troubleshooting:Ability to troubleshoot and optimize complex, multi-tenant infrastructures under pressure.Preferred QualificationsRelevant SRE or DevOps certifications (e.g., CKAD, CKA).Experience in government or defense-related SRE roles.Experience with Rancher and its ecosystem.Seniority levelMid-Senior levelEmployment typeFull-timeJob functionEngineering and Information TechnologyIndustriesIT Services and IT Consulting#J-18808-Ljbffr
Senior Site Reliability Engineer
ORION INNOVATION
vancouver, vancouver
Published 7 days ago
Report job