AI Quality, Safety & Clinical NLP Engine for Psychiatric Assessment - BC-1006

Genre de projet: Innovation
Discipline(s) souhaitée(s): Génie - informatique / électrique, Génie, Informatique, Sciences mathématiques, Statistiques / études actuarielles
Entreprise: Limbus AI
Durée du projet: 6 mois à 1 an
Date souhaitée de début: Dès que possible
Langue exigée: Anglais
Emplacement(s): West Kelowna, BC, Canada; Vancouver, BC, Canada; Canada; Canada; Canada
Nombre de postes: 1
Niveau de scolarité désiré: MaîtriseDoctorat
Ouvert aux candidatures de personnes inscrites à un établissement à l’extérieur du Canada: Yes

Au sujet de l’entreprise: 

Limbus AI is a health technology startup building AI-powered clinical decision support for psychiatric assessment. Founded by Dr. Marie Claire Bourque, MD, FRCPC, a board-certified psychiatrist, Limbus addresses the critical 2+ year wait for psychiatric assessment across Canada.

Our flagship product is a Diagnostic Avatar — an AI system that conducts comprehensive psychiatric assessments via video, administering validated screening instruments and structured clinical interviews. A deterministic Bayesian diagnostic engine (not LLM inference) processes 75 calibrated likelihood ratios across 22 psychiatric and medical conditions, producing CPSBC-compliant assessment reports.

The system has achieved 87% diagnostic concordance with expert psychiatrist assessments across 46 validated cases. Key safety features include 12 halt conditions for crisis detection, an 11-module validity assessment engine, and complete audit trails for every diagnostic decision.

Our technology stack includes Python/FastAPI, React, PostgreSQL, Anthropic Claude (for interviews only — not for diagnostic reasoning), and Tavus video avatars with multimodal emotion perception. We are pursuing Health Canada Software as a Medical Device (SaMD) Class II classification under IEC 62304 and ISO 14971.

Limbus AI is headquartered in British Columbia, with planned expansion to Ontario and Alberta.

Veuillez décrire le projet.: 

Limbus AI's diagnostic accuracy depends on two core capabilities: (1) accurately extracting DSM-5 criterion evidence from unstructured psychiatric interviews, and (2) ensuring the AI system behaves safely, reliably, and without hallucination across thousands of clinical scenarios. This project bundles both into a single 12-month innovation internship.

The intern will work across four integrated workstreams:

Workstream A — Evidence Extraction Benchmarking & Improvement: Develop a gold-standard annotation schema mapping patient utterances to DSM-5-TR criteria across 22 conditions. Build an annotated corpus of 200+ transcripts with expert clinician labels. Measure LLM extraction accuracy (precision, recall, F1) and inter-rater reliability (Cohen's kappa). Systematically identify and fix extraction errors through prompt engineering, chain-of-thought reasoning, structured output schemas, and retrieval-augmented generation.

Workstream B — Automated Safety Testing Framework: Design and execute 1,000+ adversarial test scenarios covering crisis presentation edge cases, contradictory symptom reports, malingering patterns, and boundary conditions. Build a continuous safety regression suite that runs against every code change.

Workstream C — Hallucination Detection Pipeline: Develop automated systems to detect when the LLM fabricates, misattributes, or distorts clinical evidence. Implement citation-level traceability linking every extracted criterion back to specific transcript segments.

Workstream D — IEC 62304 Compliance Testing: Build regression testing infrastructure aligned with Health Canada SaMD Class II requirements, including automated test coverage reporting, change impact analysis, and documentation generation.

Deliverables: Annotated benchmark corpus, extraction accuracy improvement of 10%+ over baseline, safety test suite (1,000+ scenarios), hallucination detection system, IEC 62304-compliant test infrastructure, quarterly reports, and a manuscript-quality technical report.

Expertise ou compétences exigées: 

Essential: Natural language processing (NLP) — text annotation, corpus development, evaluation metrics (precision, recall, F1). Experience with large language model APIs (Claude, GPT-4) and structured output extraction. Software testing methodology — unit testing, integration testing, regression testing, adversarial/fuzz testing. Python programming (extensive).

Technical: Anthropic Claude API or similar LLM API. JSON schema design. Annotation tools (Prodigy, Label Studio, or similar). CI/CD pipeline experience (GitHub Actions or similar). Automated test framework development. Familiarity with software quality standards (IEC 62304 preferred).

Preferred: Clinical NLP or biomedical information extraction experience. Knowledge of psychiatric diagnostic criteria (DSM-5-TR). Experience with retrieval-augmented generation (RAG). Familiarity with Health Canada SaMD regulatory requirements. Hallucination detection or LLM evaluation research.