Join an innovative team focused on building GPU infrastructure for high-performance AI and ML workloads. This role requires expertise in systems engineering and cloud-based solutions for optimizing large-scale operations.As an HPC Specialist, you will be integral to the AI and Multi-Asset Systematic Strategies team. Your primary responsibility will be to deploy and maintain GPU infrastructure for complex workloads. Candidates should possess strong skills in optimizing deep learning models and managing distributed systems to ensure optimal performance.Key Responsibilities:• Deploy and manage GPU server fleets for LLM workloads• Create distributed serving solutions for multi-GPU deployments• Optimize Kubernetes clusters for machine learning tasks• Configure and manage networking for GPU clusters• Troubleshoot performance issues across hardware and softwareRequirements:• Bachelor's/Master's in Computer Science or Engineering• 5+ years in DevOps or infrastructure roles• Expertise in GPU infrastructure and optimization• Strong Linux, Kubernetes, and networking knowledge• Proficient in Python and Bash scriptingLeverage your advanced skills to elevate AI operational efficiency across innovative GPU solutions.#J-18808-Ljbffr
Hpc Specialist For Advanced Ai Infrastructure Management
P2P
montreal (administrative region), montreal (administrative region)
Published 27 days ago
Report job