We’re a fast-paced, fabless semiconductor startup redefining the boundaries of AI through cutting-edge, scalable AI-infused multipurpose compute architecture. Our mission is to deliver scalable, efficient, and intelligent silicon solutions for the next generation of edge AI, robotics, autonomous systems, and mobile devices. Our leadership team brings together decades of experience in semiconductor innovation, spanning chip architecture, system design, and global business operations. The team includes pioneers behind several generations of groundbreaking compute architectures, experts in software-hardware co-design, SoC and AI development with hundreds of patents in our portfolio as well as leaders of multi-billion-dollar business units at top-tier technology companies. Position Overview This is a great opportunity to join a highly-skilled AI/ML Software team working at the intersection of HW/SW co-design. In this role, you will be responsible for designing and executing end-to-end model compression pipelines, including sensitivity analysis, quantization, pruning, and hybrid optimization techniques across large-scale transformer architectures. Key Responsibilities and Duties Build and own the end-to-end compression pipeline Baseline benchmarking and instrumentation Sensitivity analysis Implement layerwise sensitivity scoring frameworks Design and apply quantization strategies INT8, INT4, FP8, FP4 exploration Per-layer/tensor precision assignment Dynamic range calibration and scaling strategies Implement and evaluate pruning techniques Apply hybrid compression methods QAT, LoRA-based recovery, distillation Latency / throughput Memory footprint Optimize for iMachine Architecture Qualifications and Skills Successful candidates should possess the following qualifications and skills: Required Qualifications (You must possess these qualifications to be considered for the position) Bachelor of Science Degree in Electrical Engineering, Computer Science, Computer Engineering, or related field 1+ year of experience with PyTorch / JAX / TensorFlow Understanding of: Numerical precision and quantization theory Hands-on experience with: TensorRT, ONNX Runtime, or similar inference stacks Familiarity with: Sparse representations (CSR, COO, RLC ) Low-rank approximation methods (SVD, factorization) Ability to analyze: Numerical stability issue Preferred Qualifications MS or PhD in Electrical Engineering, Computer Engineering, Computer Science, or related field Experience with: Hardware-aware optimization Knowledge of: Deliver production-ready compressed models with minimal accuracy loss Achieve quantifiable performance gains (latency, memory, throughput) Build reusable tooling and automation pipelines Why Join Us Get in early at a breakthrough deep-tech startup reshaping AI compute Work closely with industry innovators and experienced leaders where your work will have a direct impact on the success of the company Be part of a mission-driven team building foundational technology for the future We balance sharp execution with continuous innovation to push the boundaries Competitive compensation, equity, and growth opportunities Benefits and Perks At I Machines, Inc., we offer competitive salaries and a comprehensive benefits package, including: Health, dental, and vision insurance Retirement savings plans Paid time off and holidays Flexible Schedule Equal Opportunity Employer I Machines, Inc., is an equal opportunity employer and does not discriminate based on race, color, religion, gender, national origin, age, disability, or any other legally protected status. All qualified applicants will be considered for employment. #J-18808-Ljbffr

Ai/Ml Model Compression & Quantization Engineer

I MACHINES, INC.

Similar jobs

Fusionneur

TCI+

Mécanicien D'équipement Lourd

TRANSPORT GINO BOIS (GROUPE TGB)

Contremaître(Sse) En Signalisation

TCI+

Assistant Gérant En Serrurerie

GROUPE PRO ACCÈS

Signaleur(Euse)

TCI+

Adjoint Responsable D'affaires

TCI+

Commis D'entrepôt

TCI+

Receive similar jobs by email