Join to apply for theSoftware Engineer – Inference Servingrole atTaalasAt Taalas we believe that fundamental progress is achieved by those who are willing to understand and assail a problem end-to-end, without regard for commonly accepted abstractions and boundaries. We are building a team of hands‑on technologists who dislike overspecialization and seek to excel in both depth and breadth. In this position the successful candidate will build software infrastructure for an inference serving cluster built around Taalas hardcore AI model chips.Job ResponsibilitiesAdapt open‑source inference servers like vLLM and Punica to interface with Taalas’ hardcore AI modelsImplement a highly efficient LoRA swapping solution for multi-{tenant,LoRA} environmentsBuild and test a scalable inference serving cluster using K8 and Traefik or similarQualificationsBachelor’s or higher degree in Computer Science, or Electrical/Computer EngineeringExperience with K8, HTTP load balancers, web‑serversGood knowledge of computer architecture and low‑level programming: Linux virtual memory and page table management, direct memory access, CUDAFamiliarity with ML, Python and PytorchInterested in joining our team? Submit your resume to to be considered for the exciting opportunity!Seniority level: Entry levelEmployment type: Full‑timeJob function: Engineering and Information TechnologyIndustries: Semiconductor Manufacturing#J-18808-Ljbffr
Software Engineer – Inference Serving
TAALAS
toronto, toronto
Published TodayNew
Report job