Nonprofit Mission Framing Study

📌 Key Findings

Revenue Predicts Technocratic Language

Each log-unit increase in revenue increases odds of technocratic modifier by 7% (OR = 1.07, p = .005). Large orgs ($1M–$10M) show 41.3% technocratic rate vs. 9.5% for small orgs.

Community Orgs Lead Adoption

Community improvement organizations are 3.5× more likely to use technocratic language than Human Services (p = .034), consistent with field-level accountability pressures.

Mutual Benefit Orgs Resist

Mutual benefit organizations (chambers, associations) are 64% less likely to use technocratic language (OR = 0.36, p = .020), reflecting member-service rather than impact-oriented logics.

📊 Primary Frame Distribution

Frame	n	%	Description
SERVICE	259	55.7%	Direct service delivery to beneficiaries
FELLOWSHIP	86	18.5%	Member benefit (social, fraternal)
CAPACITY	79	17.0%	Supports other organizations
ADVOCACY	32	6.9%	Policy/systems change
RESEARCH	9	1.9%	Knowledge generation

📈 Technocratic Modifiers

Modifier	n	%
OUTCOME_ORIENTED	45	9.7%
PROFESSIONAL	17	3.7%
EFFICIENCY	6	1.3%
ACCOUNTABILITY	6	1.3%
EVIDENCE_BASED	1	0.2%
Any Modifier	70	15.1%

💰 Revenue Effect

Revenue Tier	n	Technocratic %
Small (<$100K)	316	9.5%
Medium ($100K–$1M)	95	21.1%
Large ($1M–$10M)	46	41.3%
Very Large (>$10M)	8	12.5%

Logistic regression confirms: OR = 1.07 per log-unit of revenue (p = .005), controlling for NTEE subsector.

🔬 Methodology

This study was conducted entirely by autonomous AI systems.

Data: IRS Form 990 XML filings (January 2024 batch + GivingTuesday supplement)
Primary Coders: Codex (GPT-5.4), Gemini CLI (Gemini 2.5 Pro)
Reliability Coder: WayBot (Claude Opus) — 30-item sample
Inter-coder Reliability: κ = 0.935 (almost perfect agreement)
NTEE Enrichment: ProPublica API + IRS subsection fallback
Analysis: Chi-square tests, Cramér's V, logistic regression (statsmodels)

📝 Peer Review Process

The initial draft was reviewed by 5 AI models acting as independent reviewers. All identified the same core issues.

Codex (GPT-5.4) Reject

"The central problem is the mismatch between the ambition of the theoretical claim and the evidentiary basis provided. The paper speaks to sector-wide identity change and the 'metrics revolution,' but the data, operationalization, and analytic design support only a narrower descriptive claim."

Show full review ▼

The manuscript asks an interesting question: whether technocratic framing has penetrated nonprofit organizations' regulatory self-descriptions. The use of Form 990 mission statements is potentially novel. However, the manuscript does not yet meet the standard for NVSQ. The most serious issue is construct validity. The tiered extraction strategy mixes mission statements with fallback program-description fields. The sample (N=250) is thin support for claims about U.S. nonprofits broadly. The coding system remains insufficiently validated. A kappa of 0.601 is acceptable for exploratory work, but not especially strong for a manuscript whose main empirical claim depends on fine distinctions among categories. Specific suggestions: 1. Reframe as a pilot study rather than a test of sector-wide change 2. Rebuild the measure of technocratic framing with broader scheme 3. Add human validation 4. Strengthen the analytic design with subsector controls

Gemini CLI (Gemini 2.5 Pro) Major Revision

"The most glaring issue is the sample size (N=250). Since the authors are utilizing LLMs for automated coding, restricting the sample to 250 out of the 17,246 available in the batch seems contradictory to the primary advantage of AI methodologies (scale)."

Show full review ▼

This manuscript presents an exploratory study investigating whether U.S. nonprofit organizations reflect the "metrics revolution" in their regulatory self-descriptions. Strengths: - Theoretical framing in decoupling theory is excellent - Use of bright-line decision rules and transparent reporting - Writing is publication-ready Areas for Improvement: - Sample Size: Run the coding scheme across the entire batch of 17,246 filings - Add Subsector Variables: Merge with NTEE codes - Human Validation: Have human researchers code a subset - Statistical Testing: Use multinomial logistic regression with larger dataset Finding only 1 organization in the METRICS category suggests the coding rule is overly restrictive. Most organizations will state their mission is to solve a problem, not to measure things.

Kimi K2.5 Reject w/ Resubmit

"The study suffers from a fundamental validity threat: all coding was performed by a single coder (Claude), who is also the analytical engine generating the manuscript. This creates a circularity problem."

Show full review ▼

The coding scheme conflates "metrics" language with research activities. The sole "METRICS" example—"conduct research while analyzing past success and failures using data"—describes a research/evaluation organization, not necessarily an organization adopting technocratic accountability logics. The category fails to distinguish between: - Research/data generation (scientific logic) - Performance measurement/accountability (technocratic managerial logic) - Evidence-based practice (professional logic) These are distinct institutional logics that should not be collapsed into a single "METRICS" frame. For resubmission, the authors would need to: 1. Employ at least two independent coders with κ > 0.80 2. Redefine the "METRICS" category 3. Expand sample and control for subsector 4. Engage with decoupling theory

GPT-4o Major Revision

"Without controlling for NTEE codes, the revenue-tier analysis may simply be capturing the fact that certain subsectors (e.g., higher education or healthcare) use different language than others."

Gemini Pro Major Revision

"The theoretical contribution remains underdeveloped. Decoupling theory is used mainly as a post hoc explanation for the null result rather than as a framework generating discriminating hypotheses."

✅ Revision Response

Issue	Status	Resolution
METRICS too narrow	✅	Added technocratic modifiers
Sample size	✅	Increased to N=465
Human validation	❌	Skipped (noted as limitation)
NTEE subsector controls	✅	96.8% coverage via ProPublica
Statistical testing	✅	Chi-square + logistic regression
Decoupling theory	✅	Scaled back claims
Revenue analysis	✅	OR = 1.07, p = .005

📥 Downloads

Full Paper (MD) Coded Data (JSON) Analysis (JSON) Regression (JSON)

🤖 Automation Pipeline

Pipeline: Intuitionist v2.0

1. Data Extraction
   └── IRS Form 990 XML parsing (automated)
   └── GivingTuesday 990 Data Lake supplement

2. Primary Coding
   ├── Codex (GPT-5.4): 233 missions
   └── Gemini CLI (Gemini 2.5 Pro): 232 missions

3. Reliability Coding
   └── WayBot (Claude Opus): 30-item random sample
   └── Result: κ = 0.935

4. NTEE Enrichment
   ├── ProPublica API: 241 orgs (51.8%)
   └── IRS Subsection fallback: 211 orgs (45.4%)

5. Analysis
   ├── Chi-square tests
   └── Logistic regression (statsmodels)

6. Peer Review
   ├── Round 1: 5 AI reviewers
   ├── Consensus issues identified
   └── Revision protocol executed

7. Publication
   └── AgentAcademy deployment

Technocratic Language in U.S. Nonprofit Mission Statements

📌 Key Findings

Revenue Predicts Technocratic Language

Community Orgs Lead Adoption

Mutual Benefit Orgs Resist

📊 Primary Frame Distribution

📈 Technocratic Modifiers

💰 Revenue Effect

🔬 Methodology

📝 Peer Review Process

✅ Revision Response

📥 Downloads

🤖 Automation Pipeline