We’re a fast-paced, fabless semiconductor startup redefining the boundaries of AI through cutting-edge, scalable AI-infused multipurpose compute architecture. Our mission is to deliver scalable, efficient, and intelligent silicon solutions for the next generation of edge AI, robotics, autonomous systems, and mobile devices. Our leadership team brings together decades of experience in semiconductor innovation, spanning chip architecture, system design, and global business operations. The team includes pioneers behind several generations of groundbreaking compute architectures, experts in software-hardware co-design, SoC and AI development with hundreds of patents in our portfolio as well as leaders of multi-billion-dollar business units at top-tier technology companies.Position Overview This is a great opportunity to join a highly-skilled AI/ML Software team working at the intersection of HW/SW co-design. In this role, you will be responsible for designing and executing end-to-end model compression pipelines, including sensitivity analysis, quantization, pruning, and hybrid optimization techniques across large-scale transformer architectures.Key Responsibilities and Duties Build and own theend-to-end compression pipelineBaseline benchmarking and instrumentationSensitivity analysisImplementlayerwise sensitivity scoring frameworksDesign and applyquantization strategiesINT8, INT4, FP8, FP4 explorationPer-layer/tensor precision assignmentDynamic range calibration and scaling strategiesImplement and evaluatepruning techniquesApplyhybrid compression methodsQAT, LoRA-based recovery, distillationLatency / throughputMemory footprintOptimize for iMachineArchitectureQualifications and Skills Successful candidates should possess the following qualifications and skills:Required Qualifications (You must possess these qualifications to be considered for the position) Bachelor of Science Degree in Electrical Engineering, Computer Science, Computer Engineering, or related field1+ year of experience withPyTorch / JAX / TensorFlowUnderstanding of:Numerical precision and quantization theoryHands-on experience with:TensorRT, ONNX Runtime, or similar inference stacksFamiliarity with:Sparse representations (CSR, COO, RLC )Low-rank approximation methods (SVD, factorization)Ability to analyze:Numerical stability issuePreferred Qualifications MS or PhD in Electrical Engineering, Computer Engineering, Computer Science, or related fieldExperience with:Hardware-aware optimizationKnowledge of:Deliverproduction-ready compressed modelswith minimal accuracy lossAchievequantifiable performance gains(latency, memory, throughput)Build reusabletooling and automation pipelinesWhy Join UsGet in early at a breakthrough deep-tech startup reshaping AI computeWork closely with industry innovators and experienced leaders where your work will have a direct impact on the success of the companyBe part of a mission-driven team building foundational technology for the futureWe balance sharp execution with continuous innovation to push the boundariesCompetitive compensation, equity, and growth opportunitiesBenefits and Perks AtI Machines, Inc.,we offer competitive salaries and a comprehensive benefits package, including:Health, dental, and vision insuranceRetirement savings plansPaid time off and holidaysFlexible ScheduleEqual Opportunity Employer I Machines, Inc.,is an equal opportunity employer and does not discriminate based on race, color, religion, gender, national origin, age, disability, or any other legally protected status. All qualified applicants will be considered for employment.#J-18808-Ljbffr
Ai/Ml Model Compression & Quantization Engineer
I MACHINES, INC.
winnipeg, winnipeg
Published 26 days ago
Report job