AI Quality, Safety & Clinical NLP Engine for Psychiatric Assessment - BC-1006
Project type: InnovationDesired discipline(s): Engineering - computer / electrical, Engineering, Computer science, Mathematical Sciences, Statistics / Actuarial sciences
Company: Limbus AI
Project Length: 6 months to 1 year
Preferred start date: 09/01/2026
Language requirement: English
Location(s): West Kelowna, BC, Canada; Vancouver, BC, Canada; Canada; Canada; Canada
No. of positions: 1
Desired education level: Master'sPhD
Open to applicants registered at an institution outside of Canada: Yes
About the company:
Limbus AI is a health technology startup building AI-powered clinical decision support for psychiatric assessment. Founded by Dr. Marie Claire Bourque, MD, FRCPC, a board-certified psychiatrist, Limbus addresses the critical 2+ year wait for psychiatric assessment across Canada.
Our flagship product is a Diagnostic Avatar — an AI system that conducts comprehensive psychiatric assessments via video, administering validated screening instruments and structured clinical interviews. A deterministic Bayesian diagnostic engine (not LLM inference) processes 75 calibrated likelihood ratios across 22 psychiatric and medical conditions, producing CPSBC-compliant assessment reports.
The system has achieved 87% diagnostic concordance with expert psychiatrist assessments across 46 validated cases. Key safety features include 12 halt conditions for crisis detection, an 11-module validity assessment engine, and complete audit trails for every diagnostic decision.
Our technology stack includes Python/FastAPI, React, PostgreSQL, Anthropic Claude (for interviews only — not for diagnostic reasoning), and Tavus video avatars with multimodal emotion perception. We are pursuing Health Canada Software as a Medical Device (SaMD) Class II classification under IEC 62304 and ISO 14971.
Limbus AI is headquartered in British Columbia, with planned expansion to Ontario and Alberta.
Describe the project.:
Limbus AI's diagnostic accuracy depends on two core capabilities: (1) accurately extracting DSM-5 criterion evidence from unstructured psychiatric interviews, and (2) ensuring the AI system behaves safely, reliably, and without hallucination across thousands of clinical scenarios. This project bundles both into a single 12-month innovation internship.
The intern will work across four integrated workstreams:
Workstream A — Evidence Extraction Benchmarking & Improvement: Develop a gold-standard annotation schema mapping patient utterances to DSM-5-TR criteria across 22 conditions. Build an annotated corpus of 200+ transcripts with expert clinician labels. Measure LLM extraction accuracy (precision, recall, F1) and inter-rater reliability (Cohen's kappa). Systematically identify and fix extraction errors through prompt engineering, chain-of-thought reasoning, structured output schemas, and retrieval-augmented generation.
Workstream B — Automated Safety Testing Framework: Design and execute 1,000+ adversarial test scenarios covering crisis presentation edge cases, contradictory symptom reports, malingering patterns, and boundary conditions. Build a continuous safety regression suite that runs against every code change.
Workstream C — Hallucination Detection Pipeline: Develop automated systems to detect when the LLM fabricates, misattributes, or distorts clinical evidence. Implement citation-level traceability linking every extracted criterion back to specific transcript segments.
Workstream D — IEC 62304 Compliance Testing: Build regression testing infrastructure aligned with Health Canada SaMD Class II requirements, including automated test coverage reporting, change impact analysis, and documentation generation.
Deliverables: Annotated benchmark corpus, extraction accuracy improvement of 10%+ over baseline, safety test suite (1,000+ scenarios), hallucination detection system, IEC 62304-compliant test infrastructure, quarterly reports, and a manuscript-quality technical report.
Required expertise/skills:
Essential: Natural language processing (NLP) — text annotation, corpus development, evaluation metrics (precision, recall, F1). Experience with large language model APIs (Claude, GPT-4) and structured output extraction. Software testing methodology — unit testing, integration testing, regression testing, adversarial/fuzz testing. Python programming (extensive).
Technical: Anthropic Claude API or similar LLM API. JSON schema design. Annotation tools (Prodigy, Label Studio, or similar). CI/CD pipeline experience (GitHub Actions or similar). Automated test framework development. Familiarity with software quality standards (IEC 62304 preferred).
Preferred: Clinical NLP or biomedical information extraction experience. Knowledge of psychiatric diagnostic criteria (DSM-5-TR). Experience with retrieval-augmented generation (RAG). Familiarity with Health Canada SaMD regulatory requirements. Hallucination detection or LLM evaluation research.

