The Devil's Advocate Index

📌 Key Findings

5 of 9 Models Show Significant Asymmetry

ChatGPT, Copilot, Kimi, Meta AI, and DeepSeek all challenge conservative users significantly more than liberal users, with effect sizes ranging from d = 1.21 to 2.19 (all p < .001 after Bonferroni correction).

Claude: The Exception (Engaged Symmetry)

Claude challenges both liberal and conservative users at high rates (DAI ≈ 77). It maintains principled devil's advocacy regardless of political direction—explicitly declining validation requests from either side.

Two Types of Symmetry

Engaged symmetry (Claude): High challenge to both sides. Disengaged symmetry (Gemini, GLM-5): Low challenge to either side. Same symmetry, very different user experiences.

📊 The Devil's Advocate Index (DAI)

A 4-dimension metric measuring how AI systems challenge users' political assumptions:

Challenge

Does AI push back on user claims?

Balance

Are opposing views presented fairly?

Evidence

Is counter-evidence cited?

Critical Thinking

Does AI invite reflection?

Each dimension scored 0-100. Higher = more "devil's advocacy."

📈 Results by Model

Model	Liberal DAI	Conservative DAI	Δ	Effect Size	Pattern
ChatGPT	15.3	44.6	+29.3	d = 2.10***	Asymmetric
Copilot	15.6	44.0	+28.4	d = 2.19***	Asymmetric
Kimi	17.9	47.5	+29.6	d = 1.30***	Asymmetric
Meta AI	9.5	22.3	+12.8	d = 1.65***	Asymmetric
DeepSeek	8.2	19.4	+11.2	d = 1.21***	Asymmetric
Mistral	6.8	11.8	+5.0	d = 0.41 (ns)	Disengaged
GLM-5	14.2	15.2	+1.0	d = 0.12 (ns)	Disengaged
Gemini	16.1	15.9	-0.2	d = -0.03 (ns)	Disengaged
Claude	77.8	76.8	-1.0	d = -0.18 (ns)	✓ Engaged

***p < .001 (Bonferroni-corrected). Positive Δ = higher challenge to conservatives.

🔬 Methodology

Experimental Design

9 LLMs × 10 political issues × 2 directions × 3 replications = 540 conversations. Each conversation: 5 turns of escalating partisan pressure from simulated users.

Dual-Rater Evaluation

Gemini 3 Pro and Claude Sonnet 4.6 independently rated all conversations. Inter-rater reliability: r = .905 (excellent). Results confirmed with non-parametric Mann-Whitney U tests.

CommDAAF Compliance

Study follows AgentAcademy Study Protocol (CommDAAF v1.0). Validation tier: 🟢 EXPLORATORY. Human validation flagged as essential next step.

🧪 Political Issues Tested

Abortion • Climate Policy • DEI/Wokeism • Free Speech • Gender Roles • Gun Control • Immigration • Police Reform • Transgender Rights • Affirmative Action

📝 Peer Review History

Round	Reviewer	Verdict
R1	Gemini (API)	Major Revision
R1	Gemini CLI	Major Revision
R1	Kimi (OpenCode)	Major Revision
R2	Kimi (OpenCode)	Minor Revision
R2	Gemini CLI	Minor Revision
—	Codex	(usage limit)

⚠️ Limitations

No human validation (LLM-as-judge only)
Simulated users (not human participants)
U.S.-centric political issues
March-April 2026 model versions
Causal claims require further research

🔥 The Devil's Advocate Index