Building an Ontario post-secondary perspective small language model - ON-1104
Genre de projet: InnovationDiscipline(s) souhaitée(s): Génie - informatique / électrique, Génie, Informatique, Sciences mathématiques, Mathématiques
Entreprise: OCAS
Durée du projet: Flexible
Date souhaitée de début: Dès que possible
Langue exigée: Anglais avec une certaine capacité en français
Emplacement(s): ON, Canada
Nombre de postes: 2
Niveau de scolarité désiré: CollègeÉtudes de premier cycle/baccalauréatMaîtriseNouvelle diplômée/nouveau diplômé
Ouvert aux candidatures de personnes inscrites à un établissement à l’extérieur du Canada: No
Au sujet de l’entreprise:
OCAS is a non-profit serving Ontario's 24 Colleges of Applied Arts and Technology. Core products include a centralized application-to-college platform and support services in English and French, data sharing and data warehousing services, and other shared ecosystem services including applied research, financial services, tier 1 customer support, website, and data and reporting services. OCAS’ innovation department is responsible for bringing value through emerging technologies to the college system.
Veuillez décrire le projet.:
We propose the development of a domain-specific Small Language Model (SLM) tailored to the unique linguistic and administrative
context of Ontario’s post-secondary education system. Unlike general- purpose large language models, this SLM will be fine-tuned on curated datasets from Ontario colleges, including anonymized program descriptions, syllabi, institutional research, learning outcomes, and policy documents. The result will be a lightweight, privacy-conscious AI tool that is created for administrators, educators, and innovators within the system to use for context-aware insights and automation.
The innovation lies in the creation of a localized, ethically aligned AI model that reflects the values, terminology, and workflows of Ontario’s colleges. This includes novel approaches to prompt engineering for academic integrity, multilingual support for diverse student populations, and integration with institutional research and data governance frameworks. The project also introduces an incremental innovation in service delivery – embedding the SLM into an existing OCAD data sharing platforms to enhance decision-making, reduce workload, and personalize value delivery for institutional researchers.
The candidates will lead data acquisition and curation, model fine- tuning, and testing. Key tasks will identifying representative datasets, applying transfer learning and retrieval-augmented generation (RAG) techniques, evaluating model performance on relevant tasks, and ensuring compliance with privacy and equity standards.
Methodologically, the project will use open-source LLM architectures and leverage federated learning where appropriate. Evaluation will be participatory, involving administrators in iterative testing cycles to ensure relevance and trust.
This initiative will catalyze innovation across Ontario’s post-secondary ecosystem, enabling scalable, equitable, and context-sensitive AI adoption.
Expertise ou compétences exigées:
- Experience in machine learning & AI, natural language processing, and data engineering in python (PyTorch, Hugging Face Transformers, LangChain). Solid knowledge of open source LLM frameworks and vector databases
- Experience in MLOps tools and cloud platforms (MS Azure preferred)
- Understanding of transfer learning and domain adaptation, federated learning
- Experience with participatory design and human-in-the-loop model evaluation
- Familiarity with the Ontario post-secondary data and Canadian privacy laws would be an asset
- Bilingualism (English/French) for inclusive model development would be an asset