Building an Ontario post-secondary perspective small language model - ON-1104
Project type: InnovationDesired discipline(s): Engineering - computer / electrical, Engineering, Computer science, Mathematical Sciences, Mathematics
Company: OCAS
Project Length: Flexible
Preferred start date: 01/05/2026
Language requirement: English with some French proficiency
Location(s): ON, Canada
No. of positions: 2
Desired education level: CollegeUndergraduate/BachelorMaster'sRecent graduate
Open to applicants registered at an institution outside of Canada: No
About the company:
OCAS is a non-profit serving Ontario's 24 Colleges of Applied Arts and Technology. Core products include a centralized application-to-college platform and support services in English and French, data sharing and data warehousing services, and other shared ecosystem services including applied research, financial services, tier 1 customer support, website, and data and reporting services. OCAS’ innovation department is responsible for bringing value through emerging technologies to the college system.
Describe the project.:
We propose the development of a domain-specific Small Language Model (SLM) tailored to the unique linguistic and administrative
context of Ontario’s post-secondary education system. Unlike general- purpose large language models, this SLM will be fine-tuned on curated datasets from Ontario colleges, including anonymized program descriptions, syllabi, institutional research, learning outcomes, and policy documents. The result will be a lightweight, privacy-conscious AI tool that is created for administrators, educators, and innovators within the system to use for context-aware insights and automation.
The innovation lies in the creation of a localized, ethically aligned AI model that reflects the values, terminology, and workflows of Ontario’s colleges. This includes novel approaches to prompt engineering for academic integrity, multilingual support for diverse student populations, and integration with institutional research and data governance frameworks. The project also introduces an incremental innovation in service delivery – embedding the SLM into an existing OCAD data sharing platforms to enhance decision-making, reduce workload, and personalize value delivery for institutional researchers.
The candidates will lead data acquisition and curation, model fine- tuning, and testing. Key tasks will identifying representative datasets, applying transfer learning and retrieval-augmented generation (RAG) techniques, evaluating model performance on relevant tasks, and ensuring compliance with privacy and equity standards.
Methodologically, the project will use open-source LLM architectures and leverage federated learning where appropriate. Evaluation will be participatory, involving administrators in iterative testing cycles to ensure relevance and trust.
This initiative will catalyze innovation across Ontario’s post-secondary ecosystem, enabling scalable, equitable, and context-sensitive AI adoption.
Required expertise/skills:
- Experience in machine learning & AI, natural language processing, and data engineering in python (PyTorch, Hugging Face Transformers, LangChain). Solid knowledge of open source LLM frameworks and vector databases
- Experience in MLOps tools and cloud platforms (MS Azure preferred)
- Understanding of transfer learning and domain adaptation, federated learning
- Experience with participatory design and human-in-the-loop model evaluation
- Familiarity with the Ontario post-secondary data and Canadian privacy laws would be an asset
- Bilingualism (English/French) for inclusive model development would be an asset