CBCL is seeking a Junior AI Developer – Document Intelligence & Local Model Deployment to help design, build, and maintain a self‑hosted AI platform that extracts structured information from complex technical documents and automates quality assurance workflows. The platform runs entirely on local GPU hardware using open‑source models. No cloud APIs. No external data dependencies; full data sovereignty.This is a greenfield build. You won’t be fine‑tuning a chatbot or plugging into someone else’s API — you’ll be standing up local inference infrastructure, building document processing pipelines, and delivering a working prototype on real project data within your first term.You will work directly with experienced engineers and technical staff who will provide domain expertise, validation feedback, and real‑world test cases. This position is a 4‑month term with the possibility of extension based on project milestones and mutual fit.We are actively investing in applied artificial intelligence to improve how we deliver engineering services. Our AI development is done in‑house, on our own infrastructure, using open‑source tools — an approach driven by our commitment to client data sovereignty and our belief that the firms who build their own capabilities will lead the next generation of engineering practice.Your Key Responsibilities:Assist in the design and development of internal AI tools and models supporting engineering projectsDeploy and configure open‑source AI models on local GPU infrastructure running LinuxBuild a document ingestion and processing pipeline for structured and unstructured technical contentBuild context retrieval pipelines that give the model access to domain‑specific reference material at inference timeDesign and iterate on prompt templates that produce consistent, structured outputs from the modelWork with engineers and technical staff to validate model outputs against real project data — you’ll learn the domain from the people who know itMaintain clean, well‑documented code in Git from day oneDocument your architecture decisions, model configurations, and pipeline logic so the system is reproducible and maintainableYour Capabilities and Credentials:Enrolled in or recently completed a degree in Computer Science, Software Engineering, or a related field (senior undergraduate, M.Sc., or equivalent practical experience)Strong proficiency in PythonHands‑on experience deploying and running open‑source language models or vision‑language models locally (e.g., local inference servers, quantized model deployment, or self‑hosted model serving)Experience processing PDFs or other document formats programmaticallyComfortable working in Linux (Ubuntu) from the command lineWorking knowledge of Git for version control and collaborative developmentStrong Assets:Experience building context retrieval or semantic search pipelines (vector search, embedding models, retrieval strategies)Familiarity with computer vision or object detection workflowsExperience with web application development (Python backend and/or modern JS frontend)Exposure to GPU configuration and model optimizationFamiliarity with some combination of the following open‑source tools and frameworks: Ollama, vLLM, LangChain, LlamaIndex, Hugging Face Transformers, PyMuPDF, Tesseract, OpenCV, Ultralytics, Detectron2, FastAPI, Streamlit, Gradio, ChromaDB, FAISS, Label Studio, DockerAny experience with engineering drawings, construction documents, CAD, or technical documentation of any kindWhat you’ll work with:A dedicated GPU workstation running LinuxOpen‑source AI models running locallyA Python‑based development environment with mature libraries for document processing, ML, and web developmentReal project data from an established engineering consulting firm — not toy datasetsDirect access to experienced engineers who will co‑develop the domain logic with youWhy this role is different:You’ll own the build. This is a greenfield project, not a ticket queue. You’ll make architecture decisions that shape how the system works.It’s real. You’ll work with actual project data, get feedback from the people who use the output, and see whether your pipeline catches real issues.It’s self‑hosted and open source. Everything runs on local hardware with zero cloud dependencies. If you care about data sovereignty, open‑source AI, and building things that don’t phone home, this is your project.There’s a long runway. This 4‑month term is Phase 1 of a multi‑phase initiative. Strong performers have the opportunity to continue into subsequent phases with increasing technical scope and complexity.Equal Opportunity Statement CBCL is an Equal Opportunity Employer. If you require accommodations at any stage of the recruitment or interview process, please let us know and we will work with you to meet your needs.#J-18808-Ljbffr
Junior Ai Developer / Student
CBCL LIMITED
halifax, halifax
Published 27 days ago
Report job