Cross-model cultural bias in LLMs: extending the Harvard WEIRD study - ON-1220
Genre de projet: RechercheDiscipline(s) souhaitée(s): Informatique, Sciences mathématiques, Cultural studies, Sciences sociales et humaines, Psychologie
Entreprise: Tulong Technologies Inc.
Durée du projet: 4 à 6 mois
Date souhaitée de début: Dès que possible
Langue exigée: Anglais
Emplacement(s): Toronto, ON, Canada
Nombre de postes: 1
Niveau de scolarité désiré: Études de premier cycle/baccalauréat
Ouvert aux candidatures de personnes inscrites à un établissement à l’extérieur du Canada: No
Au sujet de l’entreprise:
Tulong Technologies is an AI company building a plug-and-play cultural intelligence engine for enterprises and public-sector organizations. Tulong helps teams reduce and prevent AI bias by combining governed multicultural data with structured Cultural Intelligence (CQ) reasoning, confidence-aware scoring and escalation and human-in-the-loop expert review. Designed to integrate into existing AI workflows, the platform provides clear explanations, policy controls and audit-ready logs to support safer multilingual and multicultural communications and decision support across functions and industries.
Veuillez décrire le projet.:
This research project extends the landmark 2023 Harvard University study (Atari et al., “Which Humans?”) that revealed GPT’s psychological profile aligns predominantly with Western, Educated, Industrialized, Rich, and Democratic (WEIRD) populations. Full research is aviable here: https://osf.io/preprints/psyarxiv/5b26t_v1
The original study was limited to a single AI model. This project expands the investigation to all major commercial large language models — including GPT-4o, Claude (Anthropic), Gemini (Google), Copilot (Microsoft), Llama 3 (Meta), Deepseek and Qwen (Alibaba).
The intern will build an automated pipeline to administer World Values Survey (WVS) questions across multiple LLM APIs, collecting 1000+ responses per variable per model. Using hierarchical cluster analysis and principal component analysis, the project will map each model’s cultural proximity to 30+ world cultures and produce a standardized “WEIRD Index” (0–100) enabling direct cross-model comparison.
A critical extension involves a language-switching dimension: administering identical WVS questions in 10+ languages to test whether the language of a prompt shifts a model’s cultural alignment.
Deliverables include a peer-reviewable academic paper, an open-source multi-model cultural alignment dashboard, a publicly available dataset, and a benchmarking tool. This research directly supports Tulong’s CulturalIQ Engine by providing foundational evidence that cultural bias is systemic across all major AI models — establishing the need for a dedicated cultural intelligence layer.
Expertise ou compétences exigées:
Required: Statistical analysis (hierarchical clustering, PCA, regression); familiarity with cross-cultural psychology research methods and the World Values Survey dataset.
Preferred: Experience with natural language processing (NLP), large language model evaluation, data visualization (D3.js, Plotly), and multilingual text processing. Knowledge of psychometric testing methodology and cross-cultural research design is an asset.
Software: Python (pandas, scikit-learn, scipy, matplotlib), Jupyter notebooks, API frameworks, Git/GitHub, cloud compute (AWS/GCP for large-scale API calls).

