Document-grounding evaluation for AI-assisted insurance recovery analysis - ON-1207
Genre de projet: RechercheDiscipline(s) souhaitée(s): Génie - informatique / électrique, Génie, Informatique, Sciences mathématiques, Statistiques / études actuarielles
Entreprise: ARBS Innovations Inc.
Durée du projet: 4 à 6 mois
Date souhaitée de début: Dès que possible
Langue exigée: Anglais
Emplacement(s): ON, Canada
Nombre de postes: 1
Niveau de scolarité désiré: MaîtriseDoctoratNouvelle diplômée/nouveau diplômé
Ouvert aux candidatures de personnes inscrites à un établissement à l’extérieur du Canada: No
Au sujet de l’entreprise:
ARBS Innovations Inc. is a Canadian-controlled private corporation based in Ontario, building governed AI for missed-recovery detection in insurance claims and healthcare payment integrity. We help Canadian carriers, third-party administrators, and (in a planned second vertical) regional health plans surface recoverable money hidden at the bottom of the claims funnel — missed subrogation, salvage opportunities, and overpayments — with cited evidence so a human reviewer can decide whether to pursue each opportunity. Our approach uses multi-layered AI reasoning with built-in adversarial review. Every recommendation is grounded to source documents; every decision is audit-logged; every reviewer sees full provenance. That governance discipline is what makes the system deployable in regulated insurance and healthcare workflows. Founded in 2020 by two co-founders — Yuvaraj Balachandar (Founder & CTO) and Vani Mani (Head of Operations) — ARBS is an active member of NVIDIA Inception, Google for Startups Cloud Program, and NVIDIA Capital Connect. We are pre-pilot, MVP code-complete, and engaged with Canadian Tier-2 carriers and TPAs to land first paying pilots in 2026.
Veuillez décrire le projet.:
Insurance carriers in Canada and the US receive heterogeneous claim evidence in their day-to-day operations: adjuster notes, medical records, police reports, photographs taken on phones, scanned faxes, handwritten forms, and structured data from policy and billing systems. When AI systems are applied to this evidence to identify recovery opportunities — for example, missed subrogation against a third party — the evaluation of whether the AI's output is correct is itself an open methodological challenge.
The project is a four-month research investigation into evaluation methodology for AI-assisted insurance recovery analysis. The intern will work with the company technical team and an academic supervisor to: (1) survey existing methodologies for evaluating grounded AI output against heterogeneous source evidence, including span-level grounding evaluation and citation-based output verification; (2) design a labelled evaluation dataset built from synthetic and de-identified Canadian insurance claim materials; (3) implement and benchmark at least two distinct evaluation methodologies on that dataset; (4) document failure modes — what kinds of claim evidence are hardest to evaluate against, where evaluation methodology breaks down, and the implications for AI deployment in regulated insurance workflows.
The deliverable is a written research report describing the methodology, the evaluation dataset, the benchmark results, and the failure-mode analysis. Negative results — finding that an evaluation methodology does not generalize — are equally valuable. The work supports the company's longer-term R&D program (subject to Canadian SR&ED preclaim approval review) and aims to produce findings publishable in a methodology-focused venue with the academic supervisor's approval.
Expertise ou compétences exigées:
Required: strong programming foundation in Python; familiarity with at least one modern machine-learning framework (e.g., PyTorch); comfort working with both structured data (CSV/JSON) and unstructured documents (PDFs, OCR output, images); willingness to work in a small-team startup environment under an academic supervisor; ability to write a clear research report at master's-or-better quality.
Strongly preferred: prior coursework or research exposure to natural language processing, large language model evaluation, retrieval methods, or document understanding; familiarity with evaluation methodology in machine learning (precision/recall, calibration, span-level grounding metrics); interest in regulated industries (insurance, healthcare, finance) and the methodological discipline they require.
Useful but not required: exposure to Canadian privacy regulations (PIPEDA), exposure to industry-grade audit-trail and lineage tooling, prior internship or co-op experience in a corporate R&D setting.
Software/tools (general level): Python, PyTorch or equivalent, version control (Git), Jupyter notebooks for experiment tracking. Specific cloud or model tooling provided at project start.

