Join a collaborative, remote SRE team dedicated to ensuring service reliability. In this role, leverage your expertise in automation and systems management to support a platform serving millions.You’ll integrate monitoring tools, scale cluster operations, and improve infrastructure with self-service solutions. Your insights will empower engineers as you debug OS behaviors and establish robust automation through coding. Key responsibilities will involve troubleshooting platform issues and designing efficient systems.Key Responsibilities:• Automate infrastructure processes using various tools• Monitor and troubleshoot stability and performance• Scale AWS infrastructure and Kubernetes environments• Work collaboratively across engineering teams• Design systems and implement new testsRequirements:• Mastery of Linux and debugging skills• Strong programming skills in languages like Python or Go• Experience with cloud services like AWS• Background in Linux container orchestration• Self-motivated and adaptable in a dynamic environmentContribute to system reliability while embracing a culture of creativity and teamwork.#J-18808-Ljbffr
Site Reliability Engineer With Automation Focus
YELP
montreal (administrative region), montreal (administrative region)
Published 27 days ago
Report job