Become a pivotal part of Cohere’s mission as a Senior Software Engineer managing GPU superclusters. This role supports AI model training and fosters innovation in a flexible team environment.Cohere is seeking a Senior Software Engineer to enhance its GPU infrastructure. You will work on building and scaling robust superclusters, maintaining high performance for AI workloads while collaborating with researchers. This position also emphasizes issue resolution and the creation of intuitive tools that empower research workflows.Key Responsibilities:• Build HPC infrastructure for machine learning applications• Manage Kubernetes deployments for AI workloads• Identify and troubleshoot infrastructure challenges• Develop self-service options for AI research teams• Foster an environment of best practices in observabilityRequirements:• Extensive experience with ML infrastructure and HPC• Proven skills in managing Kubernetes environments• Proficient in Python and Go development• Familiar with RDMA and performance tuning• History of collaboration with AI researchersAdvance your software engineering career and drive impactful AI solutions with Cohere.#J-18808-Ljbffr
Senior Software Engineer - Gpu Superclusters
COHERE
toronto, toronto
Published TodayNew
Report job