Cross-model cultural bias in LLMs: extending the Harvard WEIRD study - ON-1220

Project type: Research
Desired discipline(s): Computer science, Mathematical Sciences, Cultural studies, Social Sciences & Humanities, Psychology
Company: Tulong Technologies Inc.
Project Length: 4 to 6 months
Preferred start date: 09/01/2026
Language requirement: English
Location(s): Toronto, ON, Canada
No. of positions: 1
Desired education level: Undergraduate/Bachelor
Open to applicants registered at an institution outside of Canada: No

About the company: 

Tulong Technologies is an AI company building a plug-and-play cultural intelligence engine for enterprises and public-sector organizations. Tulong helps teams reduce and prevent AI bias by combining governed multicultural data with structured Cultural Intelligence (CQ) reasoning, confidence-aware scoring and escalation and human-in-the-loop expert review. Designed to integrate into existing AI workflows, the platform provides clear explanations, policy controls and audit-ready logs to support safer multilingual and multicultural communications and decision support across functions and industries.

Describe the project.: 

This research project extends the landmark 2023 Harvard University study (Atari et al., “Which Humans?”) that revealed GPT’s psychological profile aligns predominantly with Western, Educated, Industrialized, Rich, and Democratic (WEIRD) populations. Full research is aviable here: https://osf.io/preprints/psyarxiv/5b26t_v1
The original study was limited to a single AI model. This project expands the investigation to all major commercial large language models — including GPT-4o, Claude (Anthropic), Gemini (Google), Copilot (Microsoft), Llama 3 (Meta), Deepseek and Qwen (Alibaba).
The intern will build an automated pipeline to administer World Values Survey (WVS) questions across multiple LLM APIs, collecting 1000+ responses per variable per model. Using hierarchical cluster analysis and principal component analysis, the project will map each model’s cultural proximity to 30+ world cultures and produce a standardized “WEIRD Index” (0–100) enabling direct cross-model comparison.
A critical extension involves a language-switching dimension: administering identical WVS questions in 10+ languages to test whether the language of a prompt shifts a model’s cultural alignment.
Deliverables include a peer-reviewable academic paper, an open-source multi-model cultural alignment dashboard, a publicly available dataset, and a benchmarking tool. This research directly supports Tulong’s CulturalIQ Engine by providing foundational evidence that cultural bias is systemic across all major AI models — establishing the need for a dedicated cultural intelligence layer.

Required expertise/skills: 

Required: Statistical analysis (hierarchical clustering, PCA, regression); familiarity with cross-cultural psychology research methods and the World Values Survey dataset.
Preferred: Experience with natural language processing (NLP), large language model evaluation, data visualization (D3.js, Plotly), and multilingual text processing. Knowledge of psychometric testing methodology and cross-cultural research design is an asset.
Software: Python (pandas, scikit-learn, scipy, matplotlib), Jupyter notebooks, API frameworks, Git/GitHub, cloud compute (AWS/GCP for large-scale API calls).