Model Validation Tool and Evaluation Framework - ON-1198
Genre de projet: InnovationDiscipline(s) souhaitée(s): Informatique, Sciences mathématiques
Entreprise: UST Global (Canada) Inc.
Durée du projet: 6 mois à 1 an
Date souhaitée de début: Dès que possible
Langue exigée: Anglais
Emplacement(s): Toronto, ON, Canada
Nombre de postes: 1
Niveau de scolarité désiré: Études de premier cycle/baccalauréatMaîtriseDoctoratRecherche postdoctoraleNouvelle diplômée/nouveau diplômé
Ouvert aux candidatures de personnes inscrites à un établissement à l’extérieur du Canada: No
Au sujet de l’entreprise:
UST Canada is the Canadian arm of UST, a global digital transformation solutions company that partners with leading enterprises to drive business outcomes through technology, data, and innovation. With a strong presence in Toronto and across Canada, UST Canada is deeply focused on the financial services sector, helping banks, insurers, and fintechs modernize legacy platforms, enhance customer experience, and accelerate their digital agendas.
UST Canada combines deep local market expertise with UST’s global delivery scale, offering capabilities across cloud and infrastructure modernization, data and analytics, digital engineering, automation, cybersecurity, and agile transformation. A growing area of emphasis is UST’s AI‑forward approach—investing in applied AI research and development, GenAI experimentation, and the responsible adoption of AI to move from pilots to scalable, production‑ready outcomes.
Driven by UST’s values of humility, humanity, and integrity, UST Canada partners closely with clients to deliver pragmatic, scalable solutions that balance innovation, risk, and regulatory realities in one of the world’s most sophisticated financial markets.
Veuillez décrire le projet.:
The Challenge
As Canadian Financial Institutions (FIs) move Generative AI from pilot to production, they face a critical bottleneck: Model Risk Management (MRM). Traditional validation frameworks are designed for static, predictive models and cannot effectively address the non-deterministic nature of Large Language Models (LLMs). Issues like hallucinations, prompt injection, and biased "black box" reasoning create systemic risks that current manual audit processes are too slow to mitigate. Without a standardized, automated way to generate "evidence packs," Canadian banks risk fragmented deployments and regulatory friction.
The Solution:
UST Responsible Rails Framework This project introduces an automated, bank-ready evaluation harness designed to bridge the gap between rapid AI innovation and OSFI-aligned risk standards. The framework provides a centralized "Governance-as-Code" layer, ensuring that every model update is vetted against a rigorous, repeatable taxonomy.
Key Capabilities:
• Automated Evaluation Taxonomy: Real-time scoring for groundedness, faithfulness, refusal correctness, and tool-use accuracy.
• Adversarial Red-Teaming: A specialized harness to test for jailbreaks, prompt injections, and data leakage before code hits production.
• CI/CD Quality Gates: Integration with Azure-native pipelines (AKS or Azure ML) to automatically block deployments that fail safety or quality thresholds.
• Standardized Evidence Packs: Automated generation of audit-ready documentation, reducing manual validation effort by an estimated 40%.
Strategic Impact By implementing a shared evaluation infrastructure, Canadian FIs can eliminate duplicative efforts across internal teams and ensure consistent controls. This framework transforms risk management from a "gatekeeper" into an "accelerator," providing the repeatable evidence necessary for senior leadership and regulators to approve high-impact GenAI use cases in front-office, risk, and operations.
Expertise ou compétences exigées:
1. Model Risk Management (MRM) & Regulatory SME
This is the most critical non-technical role. They translate banking regulations (like OSFI E-23
in Canada or SR 11-7 in the US) into technical requirements.
• Key Skills: Deep understanding of the "Three Lines of Defense" model, conceptual soundness, and outcome analysis.
• Project Role: Defining the Taxonomy (what counts as a "fail") and ensuring the "Evidence Packs" meet the standard of internal audit and external regulators.
2. AI Evaluation Engineer (Specialized AI Engineer)
Traditional ML engineers focus on training; these specialists focus on probing and breaking.
• Key Skills: Proficiency in RAG evaluation frameworks (e.g., Ragas, TruLens, or DeepEval), understanding LLM-as-a-judge patterns, and "Golden Dataset" curation.
• Project Role: Building the automated scoring for groundedness, faithfulness, and relevance.
3. Adversarial Tester / Red Teamer
A security-first mindset applied to language.
• Key Skills: Knowledge of the OWASP Top 10 for LLMs, experience with prompt injection, jailbreaking, and PII leakage detection.
• Project Role: Developing the "attack library" to stress-test models against malicious or accidental data exposure.
4. Full Stack AI Engineer
Bridges traditional software engineering with modern AI systems; focusing on building, integrating and deploying AI-powered applications end-to-end
• Key Skills: Proficiency in backend and frontend development (eg APIs, microservices, React, Python) experience with LLM integration, RAG pipelines, vector databases.
• Project Role: Designing and building production grade AI applications – integrating models into user facing systems, implementing retrieval pipelines, managing data flow and ensuring performance and reliability of AI driven features.

