Lecture 01.3 – Synthetic Respondents: Simulation and Supplementation
Introduction
Data collection is often the most resource-intensive phase of psychological research. Recruiting participants, especially from specialized populations, can be challenging, time-consuming, and expensive. What if researchers could generate synthetic data that mimics human responses? This possibility has emerged with the development of large language models (LLMs) that can simulate human-like survey responses (Argyle et al., 2023).
This lecture explores the concept of “synthetic respondents” – artificial entities created through LLMs that answer survey questions in place of human participants. We examine the potential of this approach to address data scarcity issues in psychological research, while critically evaluating its limitations and risks (Bisbee et al., 2024). The goal is to develop a nuanced understanding of when and how synthetic data can appropriately supplement traditional data collection methods.
The Promise of Synthetic Respondents
Synthetic respondents offer several potential advantages for psychological research:
1. Rapid Large-Scale Data Generation
Traditional data collection often takes weeks or months, with significant costs per participant. In contrast, an LLM can generate hundreds or thousands of simulated responses in minutes. This capability enables:
Quick preliminary analyses to refine research questions
Exploration of hypothetical scenarios or conditions
Generation of practice datasets for methodological training
Augmentation of small human samples to increase statistical power
2. Access to Hard-to-Reach Populations
Some populations are inherently difficult to recruit in sufficient numbers:
Rare clinical conditions
Highly specialized professional groups
Geographically remote communities
Historically marginalized groups
LLMs, having been trained on text from diverse sources, might simulate responses from these populations based on available information, potentially providing insight into groups that would otherwise be underrepresented in research (Shrestha et al., 2025).
3. Privacy and Ethical Advantages
Working with synthetic data can offer certain ethical benefits:
No need to collect sensitive personal information from real individuals
Reduced burden on vulnerable populations
Lower risk of confidentiality breaches
Ability to share “data” that would otherwise be restricted by privacy concerns
4. Methodological Applications
Synthetic respondents can serve specific methodological purposes:
Pilot testing survey instruments before deployment
Exploring the robustness of analytical techniques
Testing hypotheses in silico before investing in full-scale studies
Simulating longitudinal data where real long-term follow-up is impractical
The Scientific Reality: Empirical Findings on Synthetic Respondent Fidelity
How well do LLMs actually mimic human survey responses? Recent research provides mixed evidence:
The “Silicon Samples” Studies: Promising Correlations
Argyle et al. (2023) pioneered the concept of “silicon samples” – LLM-simulated respondents conditioned with specific demographic profiles. They found that when GPT-3 was prompted with rich backstories (e.g., age, gender, ethnicity, political ideology) drawn from real survey participants, the model’s aggregated responses matched the actual response distributions of those human subgroups with surprising accuracy.
The researchers termed this “algorithmic fidelity” – the ability of the AI to be steered in a fine-grained way to reflect specific demographic patterns rather than a generic “one-size” bias. For example, GPT-3 could be prompted to respond as a young liberal individual or an older conservative individual, and its answers to political attitude questions would shift in line with how those groups responded in real surveys.
These findings suggest that LLMs encode substantial information about sociocultural context and attitudes, allowing them to simulate responses that preserve many real-world statistical relationships when properly conditioned.
The Cross-Cultural Challenge: Diminishing Fidelity
While initial results with American samples were promising, subsequent research has shown that LLMs’ ability to mimic responses varies considerably across cultural contexts. Shrestha et al. (2025) tested whether GPT-4 could accurately simulate survey responses from participants in the United States, Saudi Arabia, and the United Arab Emirates.
Their findings indicated that while the model produced reasonably similar responses for American participants, the correlation between AI-generated and human responses was significantly weaker for Saudi and Emirati samples. The researchers concluded that LLMs, predominantly trained on Western data sources, struggle to accurately represent non-Western perspectives and cultural nuances.
This cultural gap has important implications for the use of synthetic respondents in cross-cultural or global research, suggesting that current models may inadvertently reinforce Western-centric biases rather than overcome them (Abdurahman et al., 2024). The phenomenon reflects broader concerns about the representativeness of training data in large language models (Bender et al., 2021).
Technical Artifacts vs. True Understanding
Perhaps most concerning are findings regarding the mechanisms underlying LLM responses to survey questions. Dominguez-Olmedo et al. (2024) discovered that many LLMs exhibit systematic biases when answering multiple-choice questions, such as a preference for options labeled “A” regardless of content.
When these technical artifacts were controlled for, the models often produced nearly random responses—suggesting that their apparent alignment with human answer patterns in some studies might be coincidental rather than reflecting genuine psychological understanding.
This raises fundamental questions about what LLMs are actually doing when they “take” a survey. Are they modeling human psychological processes, or simply reproducing statistical patterns in their training data without the underlying cognitive mechanisms? This concern aligns with critiques about treating LLMs as genuine sources of human-like beliefs or opinions (Bender et al., 2021).
Proper Use Cases and Limitations
Given these empirical findings, when and how should researchers consider using synthetic respondents?
Appropriate Applications
Preliminary Exploration: Using synthetic data to generate initial hypotheses or refine research questions before investing in human data collection.
Methodological Development: Testing new analytical techniques or survey instruments with simulated responses to identify potential issues.
Augmenting Limited Samples: Supplementing small human datasets with synthetic responses to increase statistical power, particularly for rare populations (with appropriate validation).
Privacy-Preserving Simulation: Generating synthetic data for teaching, demonstration, or sharing when privacy concerns would otherwise prevent the use of real participant data.
Critical Limitations
Not a Replacement for Human Data: Synthetic responses should supplement, not replace, real human data for primary analyses and conclusions.
Cultural and Demographic Boundaries: LLMs show decreased fidelity when simulating responses from populations underrepresented in training data, limiting generalizability (Shrestha et al., 2025).
Surface-Level Mimicry: LLMs may reproduce statistical patterns without capturing underlying psychological processes, potentially leading to misleading inferences about human cognition (Bisbee et al., 2024).
Reproducibility Challenges: LLM outputs can vary based on prompting techniques, model versions, and implementation details, creating challenges for scientific reproducibility.
Case Study: Depression Prediction from Synthetic Clinical Interviews
To illustrate a productive application of synthetic respondents, consider the work of Kang et al. (2025) in developing a depression severity prediction model.
The researchers faced a common challenge in clinical machine learning: limited access to real patient data due to privacy concerns and the scarcity of properly labeled examples across the full spectrum of depression severity. They particularly lacked examples of more severe cases, creating a skewed dataset.
Their solution was to use an LLM to generate synthetic clinical interview transcripts that represented different levels of depression. These synthetic interviews were:
Based on patterns learned from a smaller set of real de-identified interviews
Validated by clinical experts for realism and appropriate symptomatic presentation
Used to augment (not replace) the real data for training purposes
The results were impressive: the model trained on the combined real and synthetic data outperformed the model trained on real data alone, particularly in correctly identifying severe cases. This suggests that carefully implemented synthetic data can address specific gaps in psychological datasets and improve model performance.
Crucially, the researchers were transparent about their methods, validated the synthetic data against expert judgment, and used it as a supplement rather than a replacement for real clinical data.
Best Practices for Working With Synthetic Respondents
For researchers considering the use of synthetic respondents, we recommend the following guidelines:
1. Validation Against Real Human Data
Whenever possible, benchmark synthetic responses against at least a subset of real human data to assess fidelity. Calculate agreement metrics between the distributions of human and AI-generated responses to quantify representativeness (Bisbee et al., 2024).
2. Transparent Reporting
When publishing research that uses synthetic respondents, clearly document: * The specific LLM used (including version) * Prompting methods employed * Validation procedures * Any limitations or biases observed
3. Consider Fine-tuning for Specialized Domains
For research involving specialized psychological constructs or populations, consider fine-tuning LLMs on domain-specific data before generating synthetic responses. Suh et al. (2024) demonstrated that fine-tuning GPT-style models on actual survey data significantly improved their ability to predict human response distributions.
4. Human Oversight and Interpretation
Always maintain human expertise in the loop when analyzing synthetic data. LLMs lack the experiential understanding and theoretical grounding that human researchers bring. As Abdurahman et al. (2024) warn, avoid “GPTology” – the uncritical acceptance of AI outputs as ground truth about human psychology.
5. Privacy and Ethical Considerations
While synthetic data can enhance privacy, it’s not automatically anonymous. Ensure that generated responses don’t inadvertently reproduce identifiable information from training data. Consider the ethical implications of simulating responses from marginalized groups, particularly if doing so without appropriate cultural context or consultation.
Conclusion
Synthetic respondents represent a fascinating new frontier in psychological research methodology. They offer promising solutions to data scarcity challenges and open up new possibilities for hypothesis exploration. However, current LLMs have significant limitations in their ability to authentically represent human psychological processes, particularly across diverse cultural contexts.
As the technology evolves, the most productive approach is neither uncritical adoption nor categorical rejection, but rather careful integration guided by empirical validation and ethical consideration. By understanding both the capabilities and constraints of LLM-generated responses, researchers can harness this technology where appropriate while maintaining the methodological rigor essential to psychological science.
In the next lecture, we will explore how generative AI can transform not just the data we analyze, but the very way we collect it – through interactive, adaptive survey methods that blur the line between quantitative surveys and qualitative interviews.