QA Lead — AI Systems & Models Testing Contract • Montreal, QC • AI / ML Testing • LLM / RAG / LangChain About the Role We are seeking an experienced QA Lead with deep expertise in AI systems testing to join our team on a contract basis in Montreal, Québec. This role sits at the intersection of quality engineering and artificial intelligence, requiring hands‑on proficiency in LLM behavior analysis, RAG pipeline validation, and modern AI orchestration frameworks. You will own the end‑to‑end test strategy for complex AI products and help define quality standards in a rapidly evolving space. MUST-HAVE SKILLS Proven QA leadership experience designing and executing test strategies for AI/ML systems or LLM-powered applications. Strong understanding of LLM internals: tokenization, embeddings, attention mechanisms, and inference behavior to anticipate and diagnose failure modes. Hands‑on experience with prompt engineering — constructing effective prompts, detecting hallucinations, and evaluating outputs across accuracy, tone, coherence, and bias dimensions. Experience testing RAG pipelines and knowledge base integrations, including validation of data quality and retrieval accuracy as they impact model outputs. Familiarity with vector database mechanics: similarity search thresholds, embedding drift, near‑duplicate documents, and sparse vs. dense embeddings. Practical experience with LangChain and/or LangGraph — able to read chain/graph construction code, identify failure points, and write test harnesses. Ability to validate MCP (Model Context Protocol) integration points, including tool availability and error‑handling scenarios. Proficiency applying generative AI evaluation metrics and establishing quality thresholds appropriate for production AI systems. Excellent written and verbal communication in English; bilingualism (English/French) is a plus for the Montreal market. NICE‑TO‑HAVE SKILLS Experience with bias detection and safety testing frameworks for AI systems. Exposure to performance and scalability testing of vector databases under high load. Familiarity with CI/CD pipelines for ML model deployment and automated regression testing. Knowledge of responsible AI principles and AI governance frameworks. Contributions to or experience with open‑source AI testing or evaluation tooling (e.g., DeepEval, Ragas, PromptFlow). Background in data engineering or data quality practices relevant to AI pipeline inputs. Cloud platform experience (AWS, Azure, or GCP) in the context of deploying or testing AI workloads. KEY RESPONSIBILITIES Lead design and execution of comprehensive test strategies across AI systems, including prompt evaluation, output quality assessment, and bias/safety analysis. Develop and maintain test harnesses for LangChain and LangGraph‑based applications; review chain and graph construction code to proactively surface integration risks. Validate RAG pipeline integrity — data ingestion, chunking, retrieval accuracy, and embedding consistency — and define edge‑case coverage for vector database interactions. Establish and track generative AI quality metrics and thresholds; report on model output quality across multiple evaluation dimensions. Collaborate with ML engineers, data scientists, and product teams to embed quality practices throughout the AI development lifecycle. Document test findings clearly for both technical and non‑technical stakeholders. Contract position based in Montreal, Québec, Canada • On‑site / Hybrid #J-18808-Ljbffr
Qa Lead — Ai Systems & Models Testing
JAY ANALYTIX
montreal (administrative region), montreal (administrative region)
Published 21 days ago
Report job