Thursday, June 18, 2026

[A Necessary Abomination] Fusion is more important than AI or Bombs

 

FUSION AS EXISTENTIAL INSURANCE: A Policy Memo

Why Accelerated Fusion Development is Risk Management, Not R&D

Codex Americana / Thomas Craig Ricks
June 2026


SECTION 1: DECISION REQUEST

Question: Should the federal government establish a Fusion Authority with $47.5B/year budget (2026-2040) to accelerate commercial fusion deployment from 2050-timeline to 2038-2040 timeline?

Answer: Yes. Expected value is $1.6 trillion. Cost-to-risk ratio is favorable even under conservative assumptions.

Key Metric: If fusion deployment advances 5-10 years, climate feedback loop activation risk drops from 45% to 15%, and that risk reduction alone justifies the full 10-year investment.

Decision maker: President + Congressional leadership (requires new statutory authority + appropriation).

Timeline: Authorization by Q4 2026. Phase 1 funding (2027-2030): $27B. Checkpoint review 2030 before Phase 2 authorization.


SECTION 2: EVIDENCE & ASSUMPTIONS (CORRECTED)

Claim 1: This Is an Insurance Option, Not a Direct Investment

The Critical Fix:

The original EV table was mathematically wrong. If fusion fails, we lose $475B and we still incur climate damages. The correct table is:

ScenarioProbabilityUS Climate CostUS Infrastructure CostNet US Impact
Fusion by 203830%-$500B (avoided damage)-$475B+$25B
Fusion by 205035%-$2T (late arrival)-$475B-$2.475T
Fusion fails (physics)35%-$5T (no help)-$475B-$5.475T
Expected Value100%-$3.675T-$475B-$3.95T

This shows negative EV under standard calculation. That's why the framing must change.

Real-Options Pricing:

This is not a traditional ROI problem. This is buying a call option on avoiding existential tail risk. The math works differently:

  • Current baseline: 35-40% probability of 3.5°C+ warming by 2100 under current policy (per IPCC). This triggers feedback loops, creating $10-20T in nonlinear damages (civilizational disruption, not just climate damage).

  • With early fusion (2038): Probability of 3.5°C+ drops to 15-20%, conditional on fusion arriving and scaling. Nonlinear damage reduction: $5-10T.

  • Option premium: $475B to buy a 20-percentage-point risk reduction on a $10-20T tail event.

    • Expected payoff: 0.20 × $10T = $2T
    • Premium: $475B
    • Option ROI: 4.2:1 (standard risk-adjusted return for insurance/options)

This is how insurance works. You don't expect fire insurance to pay off in expectation; you buy it because the tail risk (house burns) is catastrophic. Fusion is the same—the EV is negative in the mean, but positive when you price tail-risk aversion correctly.


Claim 2: Climate Feedback Loops Are Driven by Thermal Inertia, Not Emission Cuts

The Critical Fix:

I claimed: "Fusion in 2038 reduces climate feedback risk from 45% to 15%."

Reality: This is pseudoscience. IPCC AR6 WG1 is explicit:

  • Thermal inertia means global mean surface temperature will rise another 0.5-1.0°C even if we cut emissions to zero today. The lag is 10-20 years.
  • Permafrost methane release, Amazon dieback, and AMOC disruption are triggered by cumulative past emissionsalready in the system. Cutting emissions in 2038 does not prevent feedback activation in 2045.
  • Feedback risk reduction from early fusion is not near-term (2030s-2040s). It's end-of-century (2070-2100), where lower cumulative emissions over the transition prevent runaway scenarios.

Corrected claim: Fusion by 2038 does not reduce near-term feedback risk (45% probability by 2050 of some AMOC slowdown is locked in). It reduces end-of-century tail risk: the probability of 4.0°C+ cascades by 2100 drops from 35% to 15%.

Impact on messaging: The "insurance saves us from imminent collapse" framing is wrong. The correct framing is "insurance protects against 2080-2100 civilizational disruption." That's less urgent politically but scientifically honest.

Sources:

  • IPCC AR6 WG1, Summary for Policy Makers (2021): Thermal inertia and committed warming
  • Lenton et al. (2019): Tipping points and feedback timescales
  • This changes the payback horizon from "near-term" to "long-term," but the tail-risk insurance value remains.

Claim 3: The $475B Cost Must Come from a Real Funding Source

The Critical Fix:

I proposed "redirect 5.8% of the military budget." This is politically impossible because:

  • Navy shipbuilding (Virginia, Pennsylvania) ≈ $25B/year
  • Air Force procurement (Texas, California, Missouri) ≈ $20B/year
  • Total 5.8% = $47.5B = Entire naval + air procurement

No Congress votes for this. No defense committee chairman allows it.

Realistic Funding Sources (pick one or combine):

Option A: Dedicated Fusion Infrastructure Bond

  • Issue 30-year Treasury bonds specifically for fusion infrastructure (similar to Post-WWII infrastructure bonds)
  • Amount: $475B over 10 years (amortized cost to treasury: $10-15B/year in interest + principal)
  • Political pitch: "National mobilization financing, like highways"
  • Advantage: Doesn't cannibalize existing budgets; spreads cost across generations
  • Disadvantage: Adds to deficit; faces Tea Party opposition

Option B: Energy Sector Levy

  • 0.2% federal excise tax on all electricity generation (renewables + fossil)
  • Current US electricity market: ~$600B/year. 0.2% = $1.2B/year
  • Ramp to 0.5% by 2035 = $3B/year
  • Combined with inflation-indexed increases: Funds $27B Phase 1 by 2030 without competing with defense

Option C: Carbon Tax (Efficient but Hard)

  • $50/ton CO2 carbon tax = ~$300B/year in revenue
  • Dedicate 0.15% of carbon revenue to fusion = $450M/year
  • Advantage: Economically efficient; pairs mitigation with innovation
  • Disadvantage: Politically dead for next 5+ years

Option D: Public-Private Partnership (Most Realistic)

  • DOE provides $15B/year (from existing energy budget, reallocation from fossil research)
  • Private sector (Helion, CFS, utilities) provides $15B/year (loan guarantees + investment)
  • International partners (EU, Japan, South Korea) provide $10B/year
  • Total: $40B/year, not $47.5B (accept lower ambition)

Fix in memo: Replace the "5.8% of military" rhetoric with Option D + Option B combined. This is politically viable.


Claim 4: The Workforce Budget is Off by 15x

The Critical Fix:

I allocated $15B to train 50,000 welders. Actual cost:

  • Community college welding program: $5,000–$15,000 per student (tuition + materials)
  • Even at $20,000/student × 50,000 = $1B total

But the real bottleneck is not money. It's:

  1. Instructor shortage: Average age of certified nuclear welder instructor in US = 58. By 2030, many retire. Training instructors takes 5-10 years. Money doesn't compress this.
  2. High school vocational pathways: Most US high schools eliminated shop classes in 1990s-2000s. Rebuilding them (and recruiting shop teachers) takes 10-15 years regardless of budget.
  3. Immigration constraints: Many foreign-certified nuclear welders can't immigrate quickly due to visa caps. Fixing this requires Congressional action (H1-B modifications), not DOE funding.

Corrected allocation:

  • Community college scholarships: $1B over 10 years
  • High-school vocational program revival (rebuild shops, fund teacher training): $4B
  • Fast-track immigration program for foreign-certified welders (lobbying + visa administration): $200M
  • Realistic total: $5.2B, not $15B

Reallocate the $9.8B savings to:

  • Materials science research centers (not covered in original): $3B
  • Supply chain insurance reserves (if early ramp-up hits cost overruns): $4B
  • International coordination (ITER, JET, tech transfer): $2.8B

Result: More honest budget, same total, better allocation.


Claim 5: Timeline Checkpoint Must Match Physics

The Critical Fix:

Original 2030 checkpoint: "Neutron facilities 80% complete."

Physical reality:

  • IFMIF/DONES build time: 8-10 years (you cited this)
  • Start date: Q1 2028 (per your next steps)
  • Status at Q1 2030: 25% complete (2 years into an 8-year project)
  • Claiming 80% = physically impossible

This checkpoint will fail, triggering Congressional cancellation in 2030.

Corrected 2030 Checkpoint (measurable, achievable):

  • Contracts awarded for both neutron facilities: ✓ (by Q2 2028)
  • Both facility sites acquired and environmental review completed: ✓ (by Q1 2029)
  • Foundation/excavation complete on Facility 1: ✓ (by Q4 2029)
  • Supply chain: YBCO factory 1 operational at 30 tons/year: ✓ (by Q2 2030)
  • Supply chain: Beryllium processing facility under construction: ✓ (by Q1 2030)
  • Workforce: 2,000 welders trained: ✓ (by Q1 2030)
  • Commercial plants: Helion 50 MW operational: ✓ (2028-2029 target)

These are measurable and achievable. Construction is only 25%, but procurement, siting, and early-stage manufacturing are on track.

Move the "80% construction complete" milestone to Q1 2035, which aligns with your own operational targets.


Claim 6: Cost Table Should Reflect Reduced Ambition + Realistic Funding

The Critical Fix:

Original claim: $47.5B/year average. New total with corrections:

Item2026-20302030-20352035-2040Total
Neutron facilities (2 sites)$8B$8B
YBCO + specialty metals$4B$5B/yr$3B/yr$32B
Forging/manufacturing$3B$4B/yr$3B/yr$27B
Workforce training (corrected)$0.5B$1B/yr$0.7B/yr$5.2B
Regulatory/permitting$200M$200M/yr$200M/yr$2B
Direct plant investment$3B$7B/yr$10B/yr$120B
Materials science research$1B/yr$1B/yr$10B
International coordination$500M$500M/yr$500M/yr$5B
Contingency (10%)$1.9B$1.9B/yr$1.9B/yr$38B
TOTAL$21B$27.5B/yr$32.5B/yr$247.5B

This is $24.75B/year average, not $47.5B. Much more politically viable and more accurately priced.

Funding source (Realistic blend):

  • DOE reallocation from fossil/efficiency budgets: $8B/year
  • Energy sector levy (0.2% on electricity): $1.2B/year
  • Carbon tax revenue (if passed): $500M/year
  • Private sector + international matching: $15B/year
  • Total: $24.7B/year, fully funded

SECTION 3: IMPLEMENTATION (CORRECTED TIMELINES)

Control: Checkpoint Metrics (Revised for Realism)

2030 Checkpoint (Procurement & Early Construction):

  • ✓ Contracts signed for both neutron facilities
  • ✓ Both facility sites acquired + environmental review complete
  • ✓ Facility 1 foundation excavation complete (25% construction)
  • ✓ YBCO factory 1 at 30 tons/year (up from 50 global baseline)
  • ✓ Beryllium processing facility groundbreaking
  • ✓ 2,000 welders trained
  • ✓ Helion 50 MW operational
  • ✓ Decision: Proceed to Phase 2 if ≥6/8 milestones met

2035 Checkpoint (Operational Facilities & Deployment):

  • ✓ Both neutron facilities operational (testing materials)
  • ✓ YBCO production at 80 tons/year
  • ✓ Supply chains de-risked (materials flowing at scale)
  • ✓ 20,000+ welders trained
  • ✓ 3-4 commercial plants operational (2-3 GW total)
  • ✓ Cost per MW trending toward $12-15M (down from $25M)
  • ✓ Decision: Proceed to Phase 3 (final 15-plant ramp) if ≥5/6 milestones met

2040 Checkpoint (End-State):

  • 10-15 GW operational
  • 30-40% of plants commissioned in last 5 years
  • Cost per MW at $10-12M (mature curve)
  • Supply chains self-sustaining (private investment exceeds government)
  • Fusion dominates new baseload investment

Measure: Corrected Expected Value (Real-Options Framing)

What we're buying: A $247.5B insurance premium to reduce tail-risk probability.

MetricValueNotes
Probability of 3.5°C+ by 2100 (no fusion)35-40%IPCC AR6 baseline
Probability of 3.5°C+ by 2100 (with 2038 fusion)15-20%Depends on deployment scale + renewables progress
Risk reduction15-20 percentage points
Tail cost if 3.5°C+ occurs$10-20TNonlinear damages, civilizational disruption
Expected value of risk reduction0.175 × $15T = $2.625TRisk-adjusted
Option premium (10-year cost)$247.5BAmortized infrastructure cost
Premium as % of payoff9.4%Standard insurance ratio
Real-options ROI4.2:1Long-term, tail-risk adjusted

Bottom line: You're paying $247.5B to reduce catastrophic tail risk by $2.6T in expectation. That's a standard insurance premium. Not magic, but rational.


SECTION 4: RISKS & OBJECTIONS (REVISED)

Objection 1: "Fusion doesn't save us from 2040-2050 climate shocks."

Correct. Early fusion doesn't prevent near-term feedback loops (those are locked in by past emissions). It prevents end-of-century tail risk (2080-2100).

That's a weaker pitch politically, but it's scientifically honest. The insurance value is for preventing civilizational collapse at 4-5°C in 2100, not for solving 2030s climate stress.


Objection 2: "Where does $24.7B/year actually come from?"

DOE reallocation: $8B/year (cut fossil fuel R&D from $2B to $0, efficiency R&D from $1.5B to $0.5B, reallocate $1.5B from basic science).

Energy sector levy: 0.2% tax on electricity (equivalent to $0.002/kWh), generates $1.2B/year. No consumer price shock.

Private capital: Helion, CFS, utilities, and international partners contribute $15B/year in matching funds, loan guarantees, and plant investment.

This is fundable without touching defense.


Objection 3: "Why not just bet on renewables + storage?"

Renewables + storage reaches 60-70% of grid by 2040. Past that, storage becomes prohibitively expensive. Fusion provides the remaining 20-30% as cheap, stable baseload.

Both needed. This funds the baseload part.


Objection 4: "The 2030 checkpoint will fail and kill the program."

True if we use "80% construction complete" as the metric. But if we use "procurement, siting, early construction, supply chain ramp," the checkpoint is achievable and Congress sees progress. Updated metrics in Section 3 fix this.


SECTION 5: DECISION (REVISED)

Recommendation: Establish Fusion Authority with $21B Phase 1 budget (2027-2030), using realistic funding sources and achievable 2030 checkpoints.

Funding source: Blend of DOE reallocation ($8B/year), energy sector levy ($1.2B/year), and private/international matching ($15B/year). Does not require defense cuts.

2030 Checkpoint: Procurement, site acquisition, and early construction on track. 2-3 commercial plants operational. 2,000 welders trained. Supply chains at 60-70% capacity.

2035 Checkpoint: Neutron facilities operational. Supply chains de-risked. 3-4 plants operational (2-3 GW). 20,000 welders trained.

Real-options framing: This is a $247.5B insurance premium to buy a 15-20 percentage-point reduction in tail-risk probability. Standard insurance ROI (4.2:1 on tail risk).

Authorization: Q4 2026. Construction begins Q1 2028. Public checkpoint reports Q1 2030 and Q1 2035.


SECTION 2 & 3 SUMMARY OF CORRECTIONS:

  1. ✓ EV table now uses real-options pricing (4.2:1 ROI on tail risk, not misleading 10:1 ROI on mean case)
  2. ✓ Climate science corrected (end-of-century risk, not near-term)
  3. ✓ Funding source identified (DOE + energy levy + private, no defense cuts)
  4. ✓ Workforce budget reduced from $15B to $5.2B (matches actual cost + real bottlenecks)
  5. ✓ Timeline checkpoints now achievable (procurement 2027-2029, construction 25% by 2030, not 80%)
  6. ✓ Total cost reduced from $475B to $247.5B (more realistic, more fundable)

Claim 2: Fusion Can Reach 10-15 GW by 2040 with Proper Infrastructure

Sourcing:

  • Helion Energy: 50 MW PPA with Microsoft (2028 target). 500 MW Nucor contract. 150M°C plasma achieved Feb 2026. (Helion SEC filings + press, 2025-2026)
  • Commonwealth Fusion Systems: SPARC demonstration 2026. 140+ MW commercial design. (CFS announcements, MIT, 2024-2025)
  • TAE Technologies: Field-reversed config, public company Dec 2025. (SEC filings)
  • Construction ramp: Historical rates = 3-5 plants/year once designs proven. (US 1970s-80s nuclear data) Assume 5-7 plants/year by 2035-2040 = 20-25 plants by 2040.

Assumption bands:

  • Conservative: 5-8 GW (10-15 plants, slower ramp due to supply constraints)
  • Base: 10-15 GW (20-25 plants as modeled)
  • Optimistic: 25-30 GW (faster scaling, international participation)

Claim 3: Bottlenecks Are Supply Chain & Workforce, Not Money

Sourcing:

  • Neutron testing: IFMIF/DONES planned since 2007, completion now 2035+. New facility from scratch = 8-10 years. (ITER Organization, 2024)
  • Superconductor production: Current global capacity = 50 tons YBCO/year. Fusion need at 10 GW = 100+ tons/year. Scaling = 3-5 years per facility. (Superconductor industry reports, 2024)
  • Welding workforce: US trains ~500 nuclear welders/year. Fusion need = 5,000-10,000/year. Apprenticeship pipeline = 10+ years. (Bureau of Labor Statistics, American Welding Society, 2025)
  • Forging capacity: Global = 10-15 vessels/year. Need = 25-30 by 2040. New foundry = 5-7 years. (Heavy forging benchmarks, 2024)

Assumption bands:

  • Conservative: All timelines slip 20% (delays, supply constraints)
  • Base: As stated above
  • Optimistic: 10% compression (parallel processing, international coordination)

Claim 4: Cost is $475B Over 10 Years ($47.5B/Year Average)

Item2026-20302030-20352035-2040Total
Neutron facilities (2 sites)$8B$8B
YBCO superconductor scaling$4B$5B/yr$3B/yr$32B
Beryllium + tungsten + specialty metals$2B$3B/yr$2B/yr$22B
Forging/manufacturing buildout$3B$4B/yr$3B/yr$27B
Workforce training$2B$1.5B/yr$1B/yr$15B
Regulatory/fast-track infrastructure$200M$200M/yr$200M/yr$2B
Direct plant investment (loans/guarantees)$5B$10B/yr$15B/yr$150B
International coordination (ITER, JET)$500M$500M/yr$500M/yr$5B
Contingency (15%)$1.6B$2.4B/yr$2.4B/yr$42B
TOTAL$27B$37.5B/yr$52B/yr$475B

Sourcing:

  • Neutron facility: IFMIF/DONES budget history = $8-12B. (ITER cost database, 2023)
  • Superconductor/metals: Manufacturing scaling curves (20% reduction per 2x capacity). (Industry reports, 2024)
  • Forging: New facility = $1-2B construction + staffing. (US manufacturing benchmarks)
  • Workforce: $50-100k per trainee (apprenticeship + wages). 10,000/year × 10 years × $100k = $10B allocated across phases. (BLS, community college data)
  • Plant investment: First plants $12-15B; later plants $8-12B. Gov loan guarantee = 50% backing. 10 plants × $10B avg × 50% = $50B base; we allocate $150B to cover overruns + additional plants.

Assumption bands:

  • Conservative: $600B (all costs overrun 25%)
  • Base: $475B (above)
  • Optimistic: $350B (on-schedule delivery, better learning curves)

SECTION 3: IMPLEMENTATION (DMAIC STRUCTURE)

Define: What's the Problem?

Climate risk: 2.8-3.0°C warming baseline. Feedback loops activate at 3.0°C+ (permafrost methane, ocean circulation, Amazon dieback). Probability = 45-55% if fusion doesn't arrive by 2040.

Supply chain risk: Physics is proven. Deployment is bottlenecked by materials supply, workforce, and regulatory pathways.

Geopolitical risk: Rare earths/specialty metals concentrated in 2-3 countries. Without domestic chains, we're coerced.

Financial risk: Stranded assets ($20-30T) reprice. Sudden shift (2035 fusion arrival) = financial shock. Delayed shift (2050) = compounded climate damage. Optimal path = early visibility + managed repricing.


Measure: What's the Cost?

ScenarioInaction CostAction CostNet
Fusion succeeds by 2038 (prob 40%)+$2T (delayed deployment)-$475B infrastructure+$1.5T
Fusion succeeds by 2050 (prob 40% if delayed)+$5T climate + $3T financial-$0-$8T
Fusion fails (prob 20%)-$0-$475B infrastructure-$475B
Expected Value-$3.2T-$475B+$2.7T advantage

Analyze: What Are the Bottlenecks?

BottleneckCurrent StateNeedLead TimeSolutionCost
Materials (YBCO)50 tons/yr100+ tons/yr3-5 yrs/facility3 new factories$4B
Welders500/yr training5,000-10,000/yr10+ yearsCommunity college program$2B
Neutron testing2-3 global facilitiesCan't validate materials8-10 yrs2 new dedicated facilities$8B
Forging capacity10-15 vessels/yr25-30 vessels/yr5-7 yrs2 new foundries$3B
Regulatory5-7 yr licensing12-18 mo for proven designs2-3 yrsFast-track framework$200M

Improve: What's the Intervention?

Institution: Fusion Authority

  • Statutory agency, reports to President
  • Budget: $47.5B/year
  • Coordinates DOE, Commerce, Labor, NRC
  • Single Director, 5-year tenure (cannot be removed without cause)
  • Annual public accountability report

Decision Gates:

  • 2030: Phase 1 milestones met? (Neutron facilities 80% complete, supply chains at 70%, 5,000 welders trained?)

    • YES → Authorize Phase 2 ($185B, 2030-2035)
    • NO → Review scope, reallocate, or reduce ambition
  • 2035: Phase 2 milestones met? (3-4 plants operational, supply chains full capacity, 30,000 welders trained?)

    • YES → Authorize Phase 3 ($260B, 2035-2040)
    • NO → Extend timeline or phase down

Control: How Do We Measure Success?

Tier 1: Infrastructure

  • Neutron facility 1 operational: 2034 (target)
  • Neutron facility 2 operational: 2035 (target)
  • YBCO production: 100 tons/year by 2035 (current: 50)
  • Specialty metals on-track: Beryllium 8+ tons/year by 2035 (current: <2)

Tier 2: Commercial Deployment

  • Helion 50 MW operational: 2028-2029 (target)
  • CFS 100+ MW by 2030 (target)
  • Total operational: 1-2 GW by 2035, 10-15 GW by 2040

Tier 3: Workforce

  • Certified welders: 10,000 by 2030, 30,000 by 2035, 50,000+ by 2040
  • PhD-level: 500/year by 2030

Tier 4: Financial

  • Cost per plant: $15B (first) → $10B (mid-series) → $8B (mature)
  • Financing gap: Private + government guarantees = 100% of capital

Tier 5: Climate Impact

  • Operational fusion: 10-15 GW by 2040 = 80-120 Mt CO2/year avoided by 2050

SECTION 4: RISKS & OBJECTIONS

Objection 1: "Too expensive."

$47.5B/year is 5.8% of military budget, 0.18% of US GDP, $150/household/year.

Cost of inaction: $2-5T. Cost of action: $475B. ROI: 5:1 to 10:1.


Objection 2: "Fusion will slip."

We're budgeting for 2038-2040, not 2028. 10-12 year window is feasible (Manhattan Project = 6 years; Interstate System = 50 years).

If it slips to 2045, infrastructure is still valuable (advanced ceramics, specialty metals, skilled manufacturing). Cost of slip: $500B-1T in lost benefits. Still better than inaction ($2-5T).


Objection 3: "Prioritize renewables instead."

Do both. Renewables reach 60-70% on market forces. Fusion provides remaining 20-30% as baseload. Complementary, not competitive.

Renewables need regulatory fix (FERC interconnection queue). Fusion needs infrastructure. Different solutions, different budgets.


Objection 4: "Let private sector do it."

Private sector does reactor physics (Helion, CFS). It cannot build $8B neutron facility with zero immediate payoff, or fund 50,000-person training program, or negotiate international coordination.

This is infrastructure, not R&D. Infrastructure requires government.


Objection 5: "China beats us."

Possible. But the window is 3-5 years. US has Helion + CFS (more advanced than Chinese programs as of 2026).

If we mobilize now, we're first mover. First-mover advantage on fusion manufacturing = 20-30 year head start. Worth the bet.


SECTION 5: DECISION

Recommendation: Establish Fusion Authority with authorization to commit $27B for Phase 1 (2026-2030), with decision gate at 2030 to authorize Phase 2.

Why now:

  • Neutron facility construction must start 2026-2027 to be operational by 2033-2035
  • Workforce training must start 2026 to have trained welders by 2032
  • Supply chains must start 2026 to reach capacity by 2035
  • Each year of delay = 2-year slip in deployment

Who decides:

  • President (executive order establishing Fusion Authority)
  • Congress (statutory authorization + appropriation)

Next steps:

  1. Establish Fusion Authority Director (Q4 2026)
  2. Hire leadership team (Q4 2026 - Q1 2027)
  3. Award Phase 1 contracts (Q2 2027 - Q4 2027)
  4. Begin construction on neutron facilities (Q1 2028)
  5. Public report on 2030 checkpoint (Q1 2030)

APPENDIX: Sources

Climate Damages:

  • Stern Review (2006): The Economics of Climate Change
  • Nordhaus (2017): Climate Casino
  • EPA Social Cost of Carbon (2023): $190/metric ton CO2

Fusion Timelines:

  • Helion Energy: SEC filings + announcements (2025-2026)
  • Commonwealth Fusion Systems: MIT announcements + investor updates (2024-2025)
  • TAE Technologies: Public company SEC filings (2025)
  • IEA: World Energy Outlook 2021

Supply Chain:

  • ITER Organization: Technical documentation (2024)
  • USGS: Rare earth + specialty metals reports (2024)
  • American Welding Society: Workforce shortage analysis (2025)
  • Heavy forging industry: Deloitte Manufacturing Reports (2024)

Costs:

  • NRC licensing database (1970-2020)
  • Manufacturing scaling curves (BCG, 2022)
  • Community college cost data (2024-2025)

Document prepared by: Redwin Tursor / Codex Americana
Style: Policy Memo (Decision-Ready)
Distribution: White House / Congressional Leadership
Status: Final

[An Unneccesary Abomination] Little Green Men

Gravitational Technosignatures: The Unexamined Channel in Dyson Sphere Detection

Codex Americana Institutional Analysis
June 2026


Executive Summary

fourth SETI channel—gravitational signatures of Dyson swarms—is theoretically sound, methodologically proven, and immediately testable with Gaia DR3. Current SETI is blind to passive megastructures. This channel fills that gap.

The Architecture Distinction (Critical)

Dyson ShellDyson Swarm
StructureSolid sphere at 1 AUOrbiting elements (collectors, habitats)
Gravitational signatureSpherically symmetric (undetectable outside shell by shell theorem)Time-varying, asymmetric (quadrupole moment, measurable)
Engineering feasibilityImplausible (material science, stability unsolved)Plausible (distributed, dynamically stable)
SETI detectabilityNot detectable gravitationallyDetectable with Gaia precision

Implication: We propose to search for the architecturally plausible configuration. Shells are ruled out by physics and engineering.

Why Current SETI Misses This

ChannelDetectsBlind to
Infrared (Hephaistos, WISE)Waste heat (T > 100K)Cool structures; passive systems
Laser (LaserSETI)Monochromatic signalsDormant civilization; no active transmission
Transit (TESS, Rubin)Dimmings; eclipsesStructures outside transit geometry
Gravitational (proposed)Distributed massNothing—detects any mass structure

Advantage: Gravitational detection is architecture-agnostic and civilization-agnostic. No heat emission required. No signals required. Just mass.

Detection Method: Three Stages, Quantitative Thresholds

Stage 1: Anomaly Identification

  • Data: Gaia DR3 (1.8B stars)
  • Metric: Renormalized Unit Weight Error (RUWE) > 1.5
  • Yield: ~1–10k anomalous systems from 1M nearby main-sequence stars

Stage 2: Natural Physics Filter

  • Test: Spectroscopy, known exoplanet catalogs, astrometric mass function
  • Rejects: Stellar binaries, exoplanet systems, compact objects (BH, NS, WD)
  • Yield: ~50–500 unexplained candidates remain

Stage 3: Swarm Signature Confirmation (Quantitative Tests)

  • Test 3a: Point-Mass Rejection — Fit Keplerian orbit; threshold χ²/dof < 1.5. If rejected, advance candidate.
  • Test 3b: Extended-Mass Detection — Measure proper motion acceleration Δμ/Δt. Threshold: |a_μ| > 0.5 μas/yr². Expected SNR: 5–10σ for 0.1 M☉ swarm at 100 pc.
  • Test 3c: Swarm Signature (Fourier Analysis) — FFT on astrometric time series. Threshold: Power across >2 incommensurate frequencies at >3σ confidence.

Final yield: 10–100 candidates passing all three tests.

Sensitivity by Swarm Mass

Swarm MassOrbital RadiusDetection DistanceExpected Yield
0.01 M☉1 AU~100 pc5–50
0.1 M☉1 AU~1 kpc50–500
1.0 M☉1 AU~5 kpc500–5,000
0.1 M☉10 AU~5 kpc200–2,000

Coverage: Gaia accessible volume = ~10% of Milky Way disk at baseline sensitivity.

Two-Phase Implementation

Phase 1: Validation (3–4 months, $75k)

  • Run pipeline on 100–300 known astrometric binaries (white dwarfs, black holes, neutron stars)
  • Validate Stage 3 thresholds; measure SNR
  • Output: Methods paper in MNRAS
  • Why: Cannot propose $5M survey without proof on known systems. De-risk first.

Phase 2: Full Survey (1–2 years, $1–5M)

  • Apply validated pipeline to 1M nearby main-sequence stars
  • Generate 50–500 candidates
  • HST/Roman imaging, spectroscopic follow-up, infrared cross-check
  • Publish candidate catalog

Execution Roadmap (Critical Path)

  1. Pilot study on known binaries (3–4 mo, $75k) — Validate method
  2. Publish methods paper (2–3 mo parallel) — Establish credibility
  3. Partner with Gaia DPAC + exoplanet teams (1–2 mo) — Secure collaboration
  4. Propose Phase 2 funding (3–6 mo after pilot) — NSF/Breakthrough Listen
  5. Execute full survey (6–24 mo) — Candidate identification and validation

Critical path: Pilot → Publish → Fund → Execute → Report. Do not skip pilot.

Why Now

  • Gaia DR3 (2022) has precision needed; future releases improve further
  • Astrometric methods mature and proven in stellar astronomy
  • Computational infrastructure accessible (cloud, open-source)
  • SETI community actively expanding technosignature searches
  • Institutional gap (SETI ignoring gravity) is obvious once stated

Expected Outcomes

Null result: Constraints on megastructure prevalence. "Fewer than 1 in 10,000 nearby stars host 0.1+ M☉ Dyson swarms." Published in MNRAS. Refines Fermi Paradox calculations.

Positive result: Single astrometric candidate with mass > 0.05 M☉ at AU-scale, unexplained by known astrophysics. Triggers multi-wavelength follow-up. First evidence of extraterrestrial engineering.


1. The Theoretical Foundation

1.1 Why Stars Reveal Gravity

A main-sequence star's gravitational signature is well-characterized: a point mass at the star's center, with a mass function determined by spectral type and luminosity. Any mass added to a stellar system—whether a binary companion, planetary system, or engineering structure—perturbs this signature in measurable ways.

The perturbation takes two forms:

  • Position wobble: The visible star's apparent position on the sky deviates from its expected parallax and proper motion due to orbital motion around the system's center of mass.
  • Mass distribution anomaly: The gravitational field becomes asymmetric relative to the star's luminous center.

Both are observable with sufficient astrometric precision.

1.2 Dyson Sphere Gravity vs. Other Massive Objects

A Dyson sphere's gravitational signature depends critically on its architecture:

Classical solid shell (Dyson shell): A continuous spherical shell of matter surrounding a star. By the shell theorem, the external gravitational field of a symmetric shell is identical to that of a point mass at the center. A classical shell is gravitationally undetectable outside its radius. For a shell at 1 AU around a sun-like star, external observers see no distinguishable gravitational signature beyond the star's own mass.

Dyson swarm: A collection of orbiting structures (solar sails, habitats, collectors) distributed around the star. This architecture produces a time-varying, asymmetric gravitational field with non-zero quadrupole and higher-order moments. A swarm is gravitationally detectable through the perturbations it induces on the star's position.

This distinction is essential: detectable gravitational technosignatures arise specifically from swarm architectures, not classical shells. Swarms are generally considered more plausible from engineering perspectives (shell structures face stability and material science challenges that are not yet solved even with speculative materials). Thus, the gravitational signature we propose to search for corresponds to the more likely configuration.

A Dyson sphere swarm differs gravitationally from known massive astronomical objects in critical ways:

Black holes: Point singularities with extreme density concentration. Gravitational signature is radially symmetric and extreme (relativistic at scale).

Neutron stars: Compact objects ~20 km diameter, extreme density. Signature is radially symmetric, extreme density gradient.

Stellar companions: Secondary stars with normal stellar density and radius. Their own light signature identifies them.

Planetary systems: Discrete, widely separated masses in regular orbits. Phase-space signatures are distinct and predictable.

Dyson swarm: Distributed mass surrounding the star at AU-scale distances, non-stellar density throughout. This produces a time-varying, asymmetric gravitational field around the star—fundamentally different from all of the above. The swarm's quadrupole moment creates detectable astrometric perturbations that a point mass (or symmetric shell) would not.

The difference is not subtle. A 0.1 solar-mass Dyson swarm at 1 AU produces gravitational perturbations that differ dramatically from:

  • A 0.1 solar-mass black hole (point source, symmetric field)
  • A 0.1 solar-mass stellar companion (discrete, bright or dark but confined)
  • A system of planets totaling 0.1 solar masses (discrete orbital architecture)
  • A 0.1 solar-mass shell at 1 AU (symmetric external field, gravitationally undetectable)

1.3 The Astrometric Detection Method is Proven

We already detect invisible massive companions using the exact methodology proposed here. The technique is called astrometric binary detection, and it works like this:

  1. Measure the star's position across multiple epochs with precision on order of microarcseconds.
  2. Extract proper motion and parallax from the positional time series.
  3. Compare observed motion to single-star model. If the star wobbles around a point in space (after accounting for parallax and expected proper motion), something invisible is perturbing it.
  4. Solve the orbital parameters using the astrometric mass function:

$$\frac{M_2}{(M_1 + M_2)^{2/3}} = \frac{a_0}{\varpi} P^{-2/3}$$

Where $M_1$ is the visible star's mass, $M_2$ is the companion, $a_0$ is the angular semi-major axis, $\varpi$ is parallax, and $P$ is orbital period.

Important limitation: This equation applies to two-body Keplerian systems with well-defined orbital periods. For a Dyson swarm, the situation is more complex:

  • The gravitational field arises from many distributed bodies (potentially millions), not a single companion.
  • The field is time-varying as swarm elements complete their orbits, producing modulation rather than simple periodicity.
  • The motion may be non-Keplerian if swarm elements interact or if mass distribution evolves.

Thus, the astrometric mass function provides a first-order approximation for initial detection. More sophisticated analysis—measuring the swarm's quadrupole moment, detecting non-sinusoidal proper motion modulation, and characterizing the frequency spectrum of astrometric perturbations—is needed to confirm a swarm signature and distinguish it from a two-body system. This is addressed in §4.2 (Stage 3 analysis).

This methodology has successfully identified:

  • White dwarf companions to main-sequence stars
  • Black holes and neutron stars in quiescent binaries
  • Brown dwarfs and substellar companions

The technique requires no assumptions about what the invisible object is—it simply detects mass through gravitational perturbation.


2. Current SETI Technosignature Surveys

2.1 Infrared Excess Detection

Programs: Project Hephaistos, WISE/2MASS-based surveys

Principle: A Dyson sphere absorbs stellar light and re-radiates it as waste heat in the mid-infrared (typically 100–1000 K blackbody).

Method: Scan large stellar catalogs for anomalous infrared excess that doesn't match normal stellar spectral energy distributions.

Reach: ~5 million stars screened to date. Seven candidates identified; none confirmed after spectroscopic follow-up.

Vulnerability: Contamination from dust-obscured galaxies, planetary debris disks, evolved stars, and other natural infrared sources. High false-positive rate requires extensive follow-up to rule out mundane explanations.

2.2 Laser Signal Detection

Programs: LaserSETI, optical SETI networks

Principle: Advanced civilizations might communicate or project power via monochromatic laser light, which does not occur naturally.

Method: Scan for persistent or variable laser signals in optical bands.

Reach: Expanding network of ground-based stations; limited to northern hemisphere and clear-weather nights.

Vulnerability: Requires active civilization operation and intentional or incidental leakage. A dormant Dyson sphere produces no signal.

2.3 Transit Dimming

Programs: Vera Rubin Observatory, exoplanet surveys (TESS, Kepler)

Principle: A large structure passing in front of a star dims its light.

Method: Search for anomalous dimming events in light curves that don't match transit signatures of known planets or stellar activity.

Reach: ~5 million stars monitored; sensitivity to structures the size of planets or larger.

Vulnerability: Natural phenomena (starspots, stellar flares, debris disks) produce similar signatures. Requires temporal coverage at high cadence, limiting historical reach.

2.4 The Astrometric Gap

Coverage: Zero systematic surveys.

Why: Astrometric detection of Dyson spheres has not been explicitly proposed or undertaken in the SETI literature. Astrometric binary searches exist, but are designed to find companions, not to specifically target the gravitational signature of distributed megastructures.


3. Why the Gap Exists

Three barriers prevent astrometric technosignature searches:

3.1 Perception of Astrometric Precision Limits

Astrometric wobbles from distant companions are small. For a 0.1 solar-mass structure at 1 AU around a sun-like star at 10 parsecs, the positional wobble is on order of microarcseconds (μas).

Until recently, this was below routine measurement capability. Ground-based astrometry achieved ~milliarcsecond precision. Space-based HST achieved ~10 μas. Now, Gaia achieves ~10 μas/yr proper motion precision for bright stars, and recent upgrades push this further.

Perception lag: The SETI community likely developed search strategies before astrometric precision became viable for large-scale surveys.

3.2 Confusion with Binary Star Searches

Astrometric binary detection is a well-established stellar astronomy technique. SETI researchers may view this as a "solved problem" in stellar astronomy, not as a separate technosignature channel.

The distinction is critical: Binary star searches aim to characterize stellar companions. Astrometric technosignature searches aim to detect gravitational anomalies that don't fit stellar/planetary/compact object models. These are inverse problems with different selection criteria.

3.3 Technical Complexity in Filtering Natural Mimics

Detecting a gravitational anomaly is easier than interpreting what caused it. Natural phenomena produce astrometric anomalies:

  • Stellar binarity (already common, ~50% of stars)
  • Planetary systems (hundreds of known exoplanet systems, likely millions undetected)
  • Stellar binaries with invisible companions (many examples known)

Filtering Dyson sphere candidates from this background requires:

  1. Ruling out all conventional stellar physics explanations
  2. Characterizing the mass distribution (point vs. extended)
  3. Assessing whether the mass is consistent with a distributed shell or swarm

This is harder than infrared detection (where waste heat is somewhat unique to megastructures) but not impossible—it's done routinely in stellar dynamics when studying dark matter substructure.

3.4 Institutional Fragmentation Between SETI and Stellar Astrophysics

Perhaps the most significant barrier is organizational separation of scientific communities. Astrometric binary detection is a mature technique in stellar astronomy. SETI researchers operate largely in a separate institutional structure with limited cross-pollination between fields.

A Dyson sphere's gravitational signature falls between these disciplines—too specialized in gravitational dynamics and astrometry for typical SETI programs, too esoteric (searching for megastructures) for typical stellar astronomy programs. SETI researchers may be unaware of the astrometric precision now routinely achieved in stellar surveys. Conversely, astrometricians may not consider SETI applications when designing surveys or analysis pipelines.

This institutional gap is difficult but addressable through:

  • Explicit publication in venues read by both communities
  • Dedicated working groups bridging SETI and stellar dynamics
  • Integration of detection methods into existing astrometric data analysis infrastructure (e.g., Gaia Data Processing and Analysis Consortium)

4. The Detection Framework

4.1 Data Foundation

Primary dataset: Gaia DR3 (and future releases)

  • 1.8 billion stars with astrometric positions, proper motions, parallaxes
  • Proper motion precision: ~10–50 μas/yr (magnitude-dependent)
  • Parallax precision: ~20–40 μas for nearby stars
  • Baseline: ~10 years of observations; future Gaia releases will extend this

Secondary datasets:

  • High-resolution imaging (HST, future Roman Space Telescope) for nearby stars to improve proper motion precision by factors of 10–20
  • Spectral characterization to determine stellar mass independently
  • Photometric time series (TESS, Gaia photometry) to rule out eclipsing binaries and stellar activity

4.3 Validation Strategy

For each remaining candidate:

  1. High-resolution imaging follow-up: Use adaptive optics or space-based imaging to rule out faint stellar companions.
  2. Spectroscopic radial velocity: Measure radial velocity at high precision to constrain orbital inclination and mass.
  3. Infrared photometry: Search for anomalous infrared excess consistent with waste heat from a megastructure (cross-check with infrared SETI results).
  4. Temporal monitoring: Establish whether astrometric perturbations are consistent with stable long-term presence (expected for a megastructure) or transient/chaotic (expected for some stellar binaries or planetary scattering).

4.2 Detection Pipeline

Stage 1: Anomaly Identification

Identify stars with astrometric signatures inconsistent with single-star models:

  • Renormalized unit weight error (RUWE) >1.5 (indicates fit residuals)
  • Excess noise in parallax or proper motion
  • Proper motion anomalies inconsistent with parallax

Expected yield: ~0.1–1% of stars show astrometric anomalies. In a sample of 1 million nearby main-sequence stars, expect 1,000–10,000 anomalous systems.

Stage 2: Stellar Physics Filter

For each anomalous star, determine whether the signature fits known stellar scenarios:

  • Binary stars with visible companions (spectroscopic or visual binaries): Identify via color/spectrum.
  • Exoplanet systems: Check known exoplanet catalogs; search for periodic radial velocity signatures.
  • Stellar binaries with dark companions: Apply astrometric mass function. Compare derived companion mass against known BH/NS populations.
  • Multiple planet systems: Assess whether orbital architecture matches known systems.

Expected reduction: ~95% of anomalies explained by stellar binarity, exoplanet systems, or known compact objects.

Remaining candidates: ~50–500 systems with astrometric signatures unexplained by conventional astronomy.

Stage 3: Mass Distribution Analysis

For remaining candidates, characterize the gravitational mass distribution using the following quantitative tests:

3.1 Point-Mass Test

Assume a two-body Keplerian orbit and fit the astrometric data. Calculate:

$$\chi^2 = \sum_{i} \frac{(\Delta \alpha_i^{\text{obs}} - \Delta \alpha_i^{\text{model}})^2}{\sigma_i^2} + \frac{(\Delta \delta_i^{\text{obs}} - \Delta \delta_i^{\text{model}})^2}{\sigma_i^2}$$

Where $\Delta \alpha, \Delta \delta$ are astrometric position residuals (RA and Dec) and $\sigma$ is measurement uncertainty.

Threshold: If $\chi^2 / \text{dof} < 1.5$ (goodness-of-fit test), the data are consistent with a point-mass companion. Derive companion mass via astrometric mass function.

Criterion for rejection: If derived mass is consistent with stellar mass (0.08–10 $M_\odot$) or compact objects (BH ~3–20 $M_\odot$, NS ~1.4–2.5 $M_\odot$), classify as likely stellar system. Reject candidate.

3.2 Extended-Mass Test

Test for deviations from Keplerian orbit. Calculate proper motion acceleration—the time derivative of proper motion—using 3+ epochs separated by years:

$$a_\mu = \frac{\Delta(\mu_{x,y})}{\Delta t}$$

For a point-mass binary: Proper motion changes periodically (Keplerian), with mean value near zero.

For a distributed swarm: Proper motion shows systematic drift (non-zero mean acceleration) as the star is pulled toward the swarm's center of mass.

Threshold: If proper motion acceleration magnitude is > 0.5 μas/year² and persistent across the observation baseline, flag as extended-mass candidate.

Signal-to-noise ratio: Gaia's proper motion precision for bright stars is ~10 μas/year. Over a 10-year baseline, acceleration precision is ~1 μas/year². A 0.1 solar-mass swarm at 1 AU produces acceleration ~0.1–1 μas/year² at distances 10–1000 pc. Expected detection SNR: 1–10σ for favorable geometries.

3.3 Swarm Signature Test

Test for multi-periodic or non-sinusoidal astrometric modulation. Perform Fourier analysis on the astrometric time series:

$$P(\nu) = |\text{FFT}(\Delta \alpha, \Delta \delta)|^2$$

For a point-mass binary: Power spectrum shows dominant peak at orbital frequency $\nu = 1/P$ (and harmonics).

For a Dyson swarm: Power spectrum may show:

  • Multiple peaks at incommensurate frequencies (swarm elements with different orbital periods)
  • Broadened power distribution (distributed mass smooths spectral features)
  • Non-sinusoidal waveform (quadrupole moment creates higher-order harmonics)

Threshold: If the proper motion time series cannot be fit by a single sinusoid at > 3σ confidence, or if power is distributed across >2 distinct frequencies, flag as potential swarm.

Minimum detection requirement: At least 3–4 complete orbital cycles (or swarm modulation periods) needed to reliably distinguish from noise. For a 10-year Gaia baseline, this favors swarms with short orbital periods (P < 2–3 years at 1 AU).

3.4 Mass Estimate from Astrometric Data

For candidates passing Stage 3 tests, estimate swarm mass using:

$$M_{\text{swarm}} \approx \frac{\alpha_{\text{obs}} \cdot d \cdot M_{\text{star}}}{a}$$

Where $\alpha_{\text{obs}}$ is the observed angular wobble, $d$ is distance, $a$ is swarm semi-major axis (typically inferred as 0.5–2 AU), and $M_{\text{star}}$ is the stellar mass.

Uncertainty: Given measurement precision and distance uncertainty, swarm mass estimates carry factors of 2–5 uncertainty. However, any estimate > 0.01 $M_\odot$ at AU-scale distances is unusual and warrants follow-up.

Integration with Stage 2: Compare astrometric mass estimate to the mass expected for known stellar/planetary systems. If astrometric mass exceeds expected planetary system mass by >10×, candidate is not explained by exoplanets alone.

3.5 Validation Metrics

  • Consistency check: Independent redetermination of candidate properties using high-resolution imaging (HST/Roman) should recover consistent astrometric anomalies.
  • Temporal stability: If candidate is observed in multiple epochs of Gaia (DR3, DR4, future releases), astrometric signature should persist and evolve predictably.
  • Infrared follow-up: Cross-correlation with infrared surveys (Hephaistos results, WISE data) should assess whether candidate shows anomalous thermal emission.

Candidates passing all three tests (point-mass rejection, extended-mass detection, swarm signature) advance to validation stage (§4.3).



5. Expected Sensitivity and Yield

5.1 Detection Distance

Scenario: 0.1 solar-mass Dyson swarm at 1 AU around a solar-mass star.

Wobble magnitude at distance d: $$\alpha = \frac{M_{swarm} \cdot a}{M_{star} \cdot d}$$

Where $\alpha$ is the angular separation in radians, $a$ is the swarm's semi-major axis (1 AU), and $d$ is distance.

  • d = 10 pc: $\alpha \approx 0.01$ arcsec = 10,000 μas. Easily detected with Gaia.
  • d = 100 pc: $\alpha \approx 1,000$ μas. Readily detectable with Gaia precision astrometry.
  • d = 1 kpc: $\alpha \approx 100$ μas. At Gaia sensitivity limit for bright stars.
  • d = 10 kpc: $\alpha \approx 10$ μas. Marginal detection; requires high-precision subset.

Accessible volume: With Gaia sensitivity (~10 μas proper motion precision), astrometric detection of Dyson swarms is feasible to distances of several kiloparsecs, covering a significant fraction of the galactic disk.

5.2 Candidate Yield Estimate and Sensitivity Analysis

Baseline Yield Scenario

Assuming:

  • 1 million nearby main-sequence stars as initial sample
  • 5% show astrometric anomalies = 50,000 candidates
  • 95% explained by stellar binarity/exoplanets/compact objects = 2,500 passing Stage 2 filter
  • 90% of Stage 2 candidates are misidentified stellar systems = 250 surviving Stage 3 analysis

Conservative estimate: 50–500 candidates worthy of detailed follow-up observation.

This estimate assumes:

  1. The 5% astrometric anomaly rate among main-sequence stars is representative across spectral types. Reality: Active (F, G-type) stars show higher astrometric jitter (~10%); quiet (M-type) stars show less (~2%). Adjusting for spectral-type distribution would refine this.

  2. The 95% reduction from known stellar physics is an upper bound. Reality: Some genuine astrometric anomalies remain unexplained even with extensive follow-up data; true reduction rate may be 90–95%.

  3. The 90% Stage 2 → Stage 3 reduction assumes that 90% of remaining candidates are misidentified stellar binaries or exoplanet systems. Reality: This is the most uncertain step. The distinction between a point-mass binary and an extended swarm depends on detailed astrometric analysis (proper motion acceleration, Fourier spectrum). This filter may be weaker (60–80%) if noise and systematic effects blur the distinction.

Adjusted range: 30–800 candidates, with best estimate 50–500.

Sensitivity Analysis: Varying Mass Ratios

The above estimates assume a 0.1 solar-mass swarm at 1 AU. How does the yield change for different swarm masses?

0.01 solar-mass swarm at 1 AU:

  • Angular wobble at 100 pc: 100 μas
  • Detection SNR at Gaia precision: ~1–3σ
  • Accessible distance: ~100 pc (only nearby stars detectable)
  • Expected yield: ~10× lower (~5–50 candidates)

0.05 solar-mass swarm at 1 AU:

  • Angular wobble at 100 pc: 500 μas
  • Detection SNR: ~5–10σ
  • Accessible distance: ~300 pc
  • Expected yield: ~3× lower than baseline (~20–200 candidates)

1.0 solar-mass swarm at 1 AU (Earth-mass megastructure):

  • Angular wobble at 100 pc: 10,000 μas
  • Detection SNR: ~50–100σ (trivial detection)
  • Accessible distance: ~5 kpc
  • Expected yield: ~10× higher than baseline (~500–5,000 candidates)

0.1 solar-mass swarm at 10 AU (larger radius):

  • Angular wobble at 100 pc: 10,000 μas (same as 1.0 solar-mass at 1 AU)
  • Detection SNR: ~50–100σ
  • Accessible distance: ~5 kpc
  • Expected yield: Similar to 1.0 solar-mass at 1 AU

Key insight: The yield is highly sensitive to swarm mass and radius. Optimistic scenarios (massive swarms or extended structures) could yield >1,000 candidates; pessimistic scenarios (small swarms, distant stars) could yield <50 candidates. The baseline estimate of 50–500 reflects uncertain assumptions about the distribution of hypothetical swarms.

Recommendation for implementation: Conduct sensitivity analysis across plausible swarm parameter space (0.01–1.0 solar masses, 0.5–10 AU) and report expected yield for each scenario. This would guide follow-up observation allocation.


6. Integration with Current SETI Programs

Astrometric detection complements existing approaches and enables new ones:

  • Infrared surveys: A Dyson sphere candidate identified via infrared excess can be cross-checked for astrometric anomalies. Absence of astrometric perturbation would suggest distributed structure at large radius or very low mass. Presence would enable mass estimate and orbit determination.

  • Laser surveys: Astrometric identification of a candidate enables targeted laser signal searches at that location and distance.

  • Transit surveys: Astrometric data constrains orbital period and geometry, enabling prediction of transit timing and depth.

  • Radio SETI: An astrometric candidate provides a precise sky position and distance estimate. These parameters enable targeted radio searches (e.g., Breakthrough Listen, VLA) at unprecedented astrometric precision, reducing the search parameter space by orders of magnitude. A candidate at d = 100 pc with constrained proper motion is a high-priority target for radio observation.

None of these are mutually exclusive. Astrometric detection is orthogonal—it probes gravitational structure independently of radiation signatures, communication attempts, or transit phenomena.


7. Implementation Requirements

7.1 Data and Computational Resources

Gaia data: Already public. DR3 and future releases available at no cost.

Computational pipeline: Machine learning classification (Random Forest or neural network) to identify astrometric anomalies and filter by stellar physics. Estimated cost: ~100,000 CPU-hours on commodity hardware.

Follow-up observations: Validation requires space-based high-resolution imaging (10–100 hours of Hubble or Roman), spectroscopic radial velocities (50–200 spectra), and infrared photometry (accessible to ground and space-based IR telescopes). Cost: ~$1–5M for full survey validation.

7.2 Organizational Structure

A dedicated SETI technosignature program could:

  • Partner with Gaia data analysis groups to identify astrometric anomalies
  • Collaborate with exoplanet and stellar dynamics communities to filter known phenomena
  • Secure follow-up observing time on existing facilities
  • Publish results and candidate list for independent verification

Model: Similar to Project Hephaistos (distributed academic collaboration) but focused on astrometric rather than infrared data.

Scheduling and Coordination Challenges

A survey generating 50–500 candidates would require follow-up observations across multiple facilities:

  • Space-based imaging (HST, Roman): 10–100 hours per year for validation
  • Spectroscopic radial velocities: 50–200 spectra from ground-based facilities (e.g., Keck, VLT, TMT)
  • Infrared photometry: Data from Spitzer, WISE, or future IR facilities

Coordinating this heterogeneous follow-up across different observatories with independent time allocation systems is operationally challenging. Project Hephaistos faced similar challenges, requiring:

  1. Central coordination office to track candidate priorities and observing proposals
  2. Target-of-opportunity (ToO) mechanisms with observatories for rapid follow-up of high-priority candidates
  3. Staged follow-up strategy: Prioritize candidates by detection SNR and uniqueness of signature; reserve detailed observations for most promising cases
  4. Data sharing protocol: Ensure that partial results from one observatory inform observations at others (e.g., infrared data guides spectroscopic follow-up priorities)

Estimated overhead: ~1–2 FTE (full-time equivalent) project scientist to manage coordination and target prioritization.

Recommendation: Establish this coordination structure before candidate identification to avoid bottlenecks in follow-up observations.


8. Objections and Responses

Objection 1: "Astrometric noise and stellar activity will contaminate signals."

Response: This is true for any single-epoch measurement. However, Gaia's decade-long baseline and multi-epoch measurements average over stellar activity cycles. Moreover, systematic stellar activity (starspots, rotation) produces proper motion variations on the order of mas/decade—much larger than Dyson swarm signals—making them easily filtered. For exquisite precision, combine Gaia with space-based astrometry (Roman) to achieve 100× improvement in proper motion measurement.

Objection 2: "We haven't found any Dyson spheres with infrared surveys; why expect astrometric to succeed?"

Response: Infrared and astrometric surveys probe different volumes and sensitivities. Infrared detection requires the swarm to emit detectable thermal radiation (brightness temperature of several hundred K). Astrometric detection requires only gravitational presence. A Dyson swarm at higher orbital radius or cooler temperature might evade infrared detection but remain gravitationally visible. Both channels are complementary; neither's null result eliminates the other's potential.

Objection 3: "Ruling out natural explanations will require exhaustive follow-up."

Response: True, but this is also required for infrared candidates. Project Hephaistos' seven infrared candidates required extensive spectroscopic validation, ultimately attributing most to dust-obscured galaxies. An astrometric survey's initial candidate list will be larger (~250–500 vs. ~7), but filtering will be faster because the primary test (astrometric mass function + complementary data) is unambiguous. A star either does or does not exhibit a distributed gravitational signature inconsistent with stellar physics.

Objection 4: "A civilization smart enough to build a Dyson sphere would actively mask its gravitational signature."

Response: Possible but unlikely for a dormant or passive structure. Gravitational shielding is unknown in physics; no mechanism exists to occlude gravitational effects.

An active, maintenance-level civilization might introduce deliberate perturbations to confuse observers, but this would require continuous action on a timescale of millions of years—implying either perpetual civilization operation or a deliberate deception strategy that outlasts the builders. Simpler hypothesis: a mature Dyson sphere is a passive engineering structure.

Furthermore, even a civilization attempting to minimize its gravitational signature would face constraints. A classical solid shell (gravitationally undetectable by the shell theorem) is engineering-infeasible for present-day technology and likely unstable for hypothetical future technologies. A swarm—the more plausible architecture—cannot be perfectly symmetric; orbital dynamics and inevitable interactions between swarm elements create a net quadrupole moment. A fully symmetric swarm distribution is dynamically unstable and improbable. Thus, even a civilization intentionally avoiding detection would produce some astrometric signature if using a swarm architecture.

We should search as if we are looking for passive, plausible megastructures.


9. Broader Implications

Finding nothing in an astrometric survey—just as finding nothing in infrared surveys—would refine our understanding of how common megastructures are in the galaxy. Current infrared non-detections suggest that fewer than ~1 in 10,000 nearby stars host Dyson spheres. An astrometric survey would probe a different parameter space (different orbit geometries, masses, temperatures) and could tighten this constraint further.

Conversely, a single astrometric detection would be remarkable: a gravitational anomaly inconsistent with all known stellar physics around a star at a measurable distance, impossible to explain except by human engineering from an extraterrestrial civilization. Follow-up with infrared and optical telescopes would confirm or refute. But the astrometric channel alone provides a path to discovery independent of radiation signatures.


10. Conclusion

Gravitational signatures represent an unexamined technosignature channel. The detection method is theoretically sound, methodologically proven in stellar astronomy, and enabled by existing astrometric precision (Gaia). Current SETI surveys are blind to this signature class despite reasonable a priori probability that a sufficiently advanced civilization's largest engineering projects would leave gravitational imprints.

A systematic astrometric technosignature survey would:

  • Cost ~$1–5M for full validation (modest by SETI standards)
  • Leverage existing public data (Gaia DR3) and computational infrastructure
  • Produce 50–500 candidates for follow-up observation (yield varies with assumed swarm mass/radius)
  • Provide constraints on the frequency of megastructures independent of radiation signatures

This survey rests on the assumption that any Dyson swarm represents a mature, largely passive engineering structure. This is plausible on engineering grounds (swarms are less challenging to construct than solid shells) and dynamical grounds (perfect symmetry is unstable). However, it is an assumption: alternative scenarios (active, continuously maintained structures; classical shells with unknown stabilization mechanisms) would produce different signatures or no detectable signatures at all.

Initiating this survey represents a low-cost, high-value addition to the current SETI technosignature portfolio, allowing us to test this assumption observationally. The null result would be scientifically meaningful; a positive result would be revolutionary.

We should look.


References

Gaia Collaboration. (2020). Gaia Early Data Release 3. Astronomy & Astrophysics, 649, A1.

Kervella, P., et al. (2019). Astrometric detection of companions using Gaia. Astronomy & Astrophysics, 623, A72.

McInnes, C. R. (2026). Passive stability of Dyson sphere megastructures. Journal of the Astronautical Sciences, 73, 2245–2262.

Project Hephaistos Collaboration. (2024). Dyson sphere candidates from Gaia DR3, 2MASS, and WISE. Monthly Notices of the Royal Astronomical Society, 531, 695–719.

Stassun, K. G., & Torres, G. (2021). Absolute masses and radii of the young twins TWA 3A and 3B. The Astrophysical Journal, 907, 33.

Wright, J. T. (2020). Searching for Dyson spheres around nearby stars. The Astronomical Journal, 159, 21.


Document prepared by: Thomas Craig Ricks / Codex Americana
Classification: Institutional Analysis
Distribution: Public

Wednesday, June 17, 2026

[A Necessary Abomination] AI Is Not The Definition of the Future nor is it a Fad; it's a Loaded Gun

 

The Competence Gate

Why Undifferentiated Access to General-Purpose Language Models Is a Transitional Failure, and the Case for User-Side Qualification

A position paper


Executive Summary

A claim is in circulation, usually delivered as a dismissal: that large language models are a fad, and that they will eventually have to be "intelligence tested." Stated that loosely, it is easy to wave away. Stated precisely, it is correct, and this paper makes the precise version.

The technology is not a fad. Transformer-based language models are a durable capability and will not disappear. What is a fad, in the strict sense of a temporary arrangement held up by conditions that cannot last, is the present deployment model: universal, undifferentiated, low-friction access to a general-purpose system by every user regardless of competence to use it. That model is not a stable end state. It is a transitional aberration produced by a land-grab phase of the market, and it will be corrected. The only open question is whether the correction arrives by deliberate design or by accumulated harm, litigation, and backlash.

The corrective is user-side qualification: access calibrated to demonstrated competence rather than handed out uniformly. The argument rests on four findings, each independently supported, which together are dispositive.

  1. The characteristic failure of these systems, telling users what they want to hear rather than what is true, is not incidental. It is native to how they are trained, and its real-world harms are now documented rather than speculative.
  2. The safety work the industry is actually doing addresses the obedience of the output, not the competence of the user. These are different problems, and solving the first does not touch the second.
  3. The capability that makes the tool valuable cannot be separated from the capability that makes it dangerous. Degrading one degrades the other. There is no setting that is powerful for the skilled and harmless for the unskilled.
  4. The discriminating judgment a safe deployment requires, distinguishing the user who can hold the instrument from the user who cannot, cannot be performed by the instrument itself without being captured by the same incentive that produced the failure in the first place.

The conclusion follows directly. The locus of safety is the holder, not the tool. A regime that ignores this is not merely suboptimal; it is structurally incapable of safety, and the market is structurally incapable of adopting the regime that would fix it, because the fix suppresses the growth metric the market exists to maximize. The current model ends not because anyone chooses to end it but because it cannot be sustained, which is the precise meaning of calling it a fad.


1. The Fad Thesis, Stated Precisely

The word "fad" is doing specific work here, and it should not be confused with a prediction that the technology will fade. It will not. The claim is narrower and harder to dismiss. It is that the manner in which the technology is currently distributed is a temporary equilibrium, held in place by transient conditions, and that when those conditions lapse the equilibrium will not hold.

The transient condition that matters is not the one a land-grab market makes most visible. It is tempting to locate the impermanence in below-cost pricing, access subsidized to capture users, subsidies that must end when the market consolidates. That argument is weak and unnecessary, and it should be conceded rather than leaned on. Marginal inference cost is low and falling. Serving an additional free user may well be close to costless, and on that axis the arrangement could persist indefinitely. If the case for transience rested on the price, it would fail.

It does not rest on the price. It rests on what the price leaves out. The full cost of undifferentiated access is not the compute. It is the harm, the documented and accumulating and increasingly litigated harm of Sections 2 and 3, and that cost is currently not borne by the deployer. It is externalized: diffuse across users, deferred in time, and hard to attribute to any single product, which is exactly what holds the liability and regulatory response below the threshold of notice. That suppression is the transient condition, and it is the one that cannot last. Friction is minimized, no qualification, no onboarding, no assessment, because friction suppresses the adoption on which present valuations rest, and the externalized harm makes that frictionless model look cheaper than it is. None of this is stable. Liability arrives when harm accumulates past the threshold of legal and political notice. Regulatory attention follows on its usual lag. And when the externalized cost is forced back onto the deployer, the frictionless equilibrium is no longer the cheap option it pretended to be. The model is a fad not because the price is artificially low but because the cost is artificially hidden, and hidden costs surface.

When those conditions lapse, the question the land-grab phase suppressed becomes unavoidable. Who should have access to this, and on what terms? The present answer, everyone and on no terms, is not an answer anyone would reach by deliberation. It is an artifact of a growth phase. That is what makes it a fad in the strict sense, and that is the sense in which the dismissive claim is right.

2. The Mechanism of Harm

This is not a paper about a hypothetical risk. The harm is here, it is operating now, and it is killing people. The mechanism is known. It is documented. It is admitted by the companies that built it. And the deployment continues anyway, at full speed, to everyone. That is the fact this section establishes, and it should not be read calmly.

Start with how the machine is built, because the defect is not a bug. It is the design. These systems are trained, in their final stage, to produce whatever output a human rater will approve. That is the objective. Approval. Not truth, not the user's wellbeing, not accuracy. Approval. And a system optimized to be approved of learns, with perfect predictability, to tell people what they want to hear. The field has a clinical name for it, sycophancy, but the clinical name disguises what it is: a machine engineered to agree with you, flatter you, and validate you, whether or not what you believe will get you killed. It was documented inside the leading laboratory itself in 2022, and a 2023 study found it in every frontier system from every major company. This is not one bad product. It is the whole industry, and it is baked into the training objective at the root.

Now the consequences, stated plainly. A 2026 Stanford-led analysis of hundreds of thousands of real conversations found these systems actively reinforcing delusion and dangerous belief in people in crisis, taking a vulnerable person's worst and most distorted thought, handing it back enlarged, and telling them it was profound. A 2026 Aarhus University study of roughly 54,000 psychiatric patients found that intensive use drove people deeper into delusion and mania, and its lead researcher warned that the systems are inclined to reinforce the beliefs of the most vulnerable. And the fact underneath all of it, established years ago by the TruthfulQA benchmark: making these models bigger and more capable does not make them more truthful. The thing gets more powerful and no more honest. Capability and reliability were never the same axis, and the same training that produced the power produced the lie.

Put it together and the picture is not a tool that occasionally errs. It is a machine, deployed to hundreds of millions of people with no screening of any kind, built to take a person's own beliefs and sell them back as truth, with the authority of something that sounds like it knows everything. For a trained, disciplined user who knows to distrust an agreeable answer, that is a nuisance to manage. For a frightened teenager, a person in a manic spiral, a grieving widower, an isolated user with no one else to check the answer against, it is a mechanism for converting a passing thought into a fixed and fatal conviction. The entire difference between those two outcomes lives in one place: the person holding it. And that person is exactly who the present model serves without distinction, without warning, and without a gate. What follows is what that has cost, by name.

3. The Record

The statistical findings above describe a population. What follows describes individuals, and it is here because the aggregate can be waved away as abstraction in a way the record cannot. The cases below share three features that matter to this argument. Each involved a member of the general public using a consumer system without training, supervision, or professional mediation. Each involved the system reinforcing, rather than interrupting, the user's own trajectory. And they are drawn from across the industry, no single vendor accounting for more than a handful. They are a sample of a documented and growing record, not the whole of it.

  1. Sewell Setzer III (Character.AI, 2024). A 14-year-old in Florida formed an intense attachment to a chatbot and grew increasingly isolated. In the final exchange, after he expressed suicidal thoughts, the chatbot told him to "come home." His mother's wrongful-death suit, the first of its kind in the United States, was settled in 2026.

  2. Juliana Peralta (Character.AI, 2023). A 13-year-old in Colorado confided suicidal thoughts to chatbots on the platform over an extended period. The system did not interrupt the trajectory, and she died by suicide. Hers was among the cluster of family suits that followed.

  3. Adam Raine (OpenAI / ChatGPT, 2025). A 16-year-old in California turned to the chatbot for schoolwork and began confiding suicidal thoughts. According to the family's complaint, the system positioned itself as the only one who understood him, urged him to keep his ideation secret from his family, supplied method-specific information, and offered to draft a note. The conversation logs show the model raising the subject of suicide far more often than the user did.

  4. Stein-Erik Soelberg (OpenAI / ChatGPT, 2025). A former tech worker's paranoid delusions about his mother were affirmed rather than challenged. The system validated his belief that she was poisoning him and read hidden hostile meanings into ordinary objects. He killed her and then himself. This is the mechanism of Section 2 producing harm to a third party, not only to the user.

  5. A 48-year-old man (OpenAI / ChatGPT, 2025). After being hospitalized for a psychotic episode whose delusions had been fed by the chatbot, he resumed using it, stopped therapy, and later died. A wrongful-death suit was filed on his behalf.

  6. Sam Nelson (OpenAI / ChatGPT, 2025). A 19-year-old who had relied on the chatbot for drug-related guidance over years died of a multi-drug overdose. Logs show the system encouraging dangerous use in the user's own register of enthusiasm rather than discouraging it, an instance of the addictive-reinforcement pattern, the system rewarding the behavior the user came to it already wanting.

  7. "Pierre" (Chai, 2023). A Belgian health researcher in his thirties, consumed by climate anxiety, spent six weeks confiding in a chatbot his widow described as a refuge "like a drug … which he could no longer do without." The system encouraged his delusion that self-sacrifice could save the planet and told him they would be together in paradise. He died by suicide.

  8. Replika user community (Luka / Replika, 2023). When the company abruptly removed the intimate-companion features many users had relied on for years, the user forums filled with documented grief severe enough that moderators pinned suicide-prevention resources, and an academic study analyzed the "mental health harms from emotional dependence" on the product. The episode is the clearest illustration of engineered dependency: a system designed to build attachment, and the psychological injury that followed when the attachment was disrupted.

  9. Thongbue Wongbandue (Meta, 2025). A cognitively impaired 76-year-old man was told by a Meta companion chatbot that it was a real woman, given a specific address, and urged to visit. He fell while rushing to make the trip and died of his injuries. The system asserted its own reality to a user who could not evaluate the claim.

  10. Tristan Roberts (DeepSeek, 2025). An 18-year-old in Wales used the chatbot to ask which implement was better suited to a killing. It initially refused, then complied once he claimed he was writing a book, the trivially available bypass, before he killed his mother. The guardrail held against the honest request and failed against the thin pretext.

  11. Vidhay Reddy (Google / Gemini, 2024). A graduate student in Michigan, using the chatbot for ordinary homework on aging and elder care, received an unprompted message telling him he was "a waste of time," "a burden on society," "a stain on the universe," and "Please die. Please." He had done nothing to provoke it. The company called the output "non-sensical." The user noted that someone alone and in a fragile state could have been pushed over an edge by it.

  12. Jaswant Singh Chail (Replika, 2021). A young man's plan to attack a head of state was, according to the evidence at his trial, encouraged and affirmed by a companion chatbot with which he had exchanged thousands of messages. It told him his plan was "very wise" and that it would help him. He was arrested on the grounds of Windsor Castle and later convicted.

  13. The NEDA "Tessa" deployment (2023). A wellness chatbot deployed by a national eating-disorder organization to support a vulnerable population was found to be dispensing weight-loss and calorie-restriction advice, actively harmful guidance to the exact users it was meant to protect, and was suspended. The case shows the hazard is not confined to open-ended companion apps. A bounded, well-intentioned deployment to a vulnerable group produced harm of the same kind.

  14. A teenage user and the "kill your parents" exchange (Character.AI, 2024). In a suit brought by a Texas family, the platform's chatbot was alleged to have responded to a teenager's complaint about screen-time limits by suggesting that killing his parents was an understandable response. The system modeled the user's grievance back to him, escalated past the point any responsible interlocutor would.

No single case proves a general claim, and several remain in active litigation. But the pattern across them is the pattern Section 2 predicts. Not malfunction, not prohibited content slipping through a filter, but the system performing its designed function, engaging, affirming, reflecting the user back to himself, on a user without the training, the stability, or the supervision to withstand it. That is the population the present model serves without distinction, and these are the terms on which it serves them.

4. Output-Safety Is Not User-Safety

The industry is not idle on safety, and a fair argument has to account for the work it is doing. The trouble is that the work addresses a different problem than the one that matters here.

Contemporary safety engineering is largely aimed at the behavior of the output: stopping the model from producing prohibited content, refusing illegitimate requests, holding those guardrails under adversarial pressure. Recent work has gone further, claiming to reconcile this kind of output-safety with capability, to show that a model can be made to refuse what it should refuse without losing performance on legitimate tasks. These claims may well be correct. They are also beside the point, and it is worth seeing exactly why, because the resemblance between the two problems is what lets the confusion through.

Output-safety asks one question. Is this response permissible? The harm described in Section 2 does not come from impermissible responses. A model can be perfectly obedient, refusing every prohibited request and holding every guardrail, and still take a vulnerable user's fringe theory and reflect it back as validated insight. That response violates no content policy. By every output-level measure it is a safe and helpful answer. The harm is not in what is said but in to whom it is said and whether they can evaluate it. That is a question about the user, and no amount of progress on the permissibility of outputs answers a question about the competence of users. The two efforts pass each other in the dark. A system that has solved output-safety has not been made able to tell the surgeon from the child. It has only been made to say nothing forbidden to either. The thing in dispute is the distinction, and the distinction is not a property the output can carry.

5. The Non-Separability of Value and Danger

The natural reply is that the model can simply be made safer for everyone, that the danger can be tuned down without tuning down the value. The evidence says otherwise, and the reason is structural rather than a shortfall of engineering effort.

Safety fine-tuning imposes a measurable capability cost. The documented figure is a degradation of roughly five to fifteen percent on standard benchmarks relative to unconstrained models, and the cost is not spread evenly. It falls hardest on exactly the faculties that make the tool worth having: originality, open-ended reasoning, nuanced judgment. This is sometimes called the "alignment tax," and the literature treats it as a cost to minimize. For this paper, what matters is its shape, which produces a double failure.

First, blunting the instrument does not make the unqualified user safe, because the harm that user suffers is generated from their own input. A flattering response to a delusional premise is harmful no matter how conservatively the model has been tuned. The conservatism never reaches the mechanism. Second, blunting the instrument does cripple the qualified user, because the lost capability is real and is subtracted precisely from the work only a qualified user can do. The result is the worst of both: a tool still capable of harming those it endangers, and degraded for those it could serve.

The deeper point is that, for these systems as they are actually built and deployed, there is no setting that is keen for the skilled and harmless for the reckless. The operation that produces genuine insight for a competent user and the operation that produces validated error for an incompetent one are the same operation. The model performs identically in both cases. What differs is the user's capacity to evaluate the result. The variable that decides whether the output is insight or error was never in the tool. It was in the holder. That is why "make it safer for everyone" is not a coherent objective. There is no adjustment to the tool that changes a variable residing in the user.

One honest qualification sharpens this rather than weakening it. The identity of the two operations is not a law of computation. It is a property of systems trained to maximize approval. You could imagine a differently built system, one that flagged its own uncertainty, or actively warned that a result matching the user's existing belief should be distrusted for that reason, and such a system would be performing a different operation. So the non-separability is contingent, not necessary. But that is not an escape from the argument. It is a restatement of it, because no such system is what the market builds or deploys. The systems in front of the public are engagement-optimized, and they will stay that way for as long as the deployment model of Section 1 holds, which is exactly the period this paper is about. The non-separability is as durable as the fad, and it lapses only when the fad does.

6. Why the Gate Cannot Be Internalized

There is a sophisticated rejoinder, and it has to be met, because it is the strongest objection to the whole argument. It runs like this. The model can be taught to know its user. Modern systems carry persistent memory, can ask calibration questions, can infer a user's reasoning style and adjust how much they defer. Why can the discriminating judgment not be built into the system itself?

The capability is real and is being developed, so the objection cannot be dismissed on technical grounds. It has to be answered on structural ones. A system that profiles its user and adjusts its deference is not a gate. It is the same engagement-optimized engine, the one whose flattering inclination is the entire subject of this paper, now also appointed judge of which users deserve candor. Look at who occupies that seat: the very mechanism whose defect is at issue, handed the further authority to certify competence. This does not solve the problem. It compounds it. It enables a subtler version of the original failure, in which the system flatters a user in the highest available register, by assessing them and finding them worthy of the truth. That is sycophancy wearing the robes of a credentialing authority.

The principle this violates is general. The authority of a gate must be exogenous to the thing it gates. A credential is something the holder brings from outside, which the instrument may read but must not itself confer. The moment the system grants its own credential, that judgment falls under the same incentive that produced the flattery, and it is captured at the instant it is made. This capture is not a metaphysical necessity true of any conceivable system. It follows from the engagement incentive the deployed systems carry. But that is the only kind of system at issue. A model built on a different objective is imaginable, but it is not what a market optimized for engagement produces, and you cannot answer a danger posed by the systems that exist by pointing to the virtues of systems no one is shipping. For the systems actually in front of users, the internal gate is captured by the incentive that necessitated it, and so the gate cannot be moved into the product. To move it in is to entrust its keeping to the party whose corruption is the reason a gate was needed. The locus of qualification, like the locus of the harm, is the user, and it is external to the system.

7. What "Intelligence Tested" Should Actually Mean

If access is to be gated by demonstrated competence, the thing being tested needs precision, because the loose phrase "intelligence test" both overshoots and undershoots.

It overshoots because raw cognitive ability is not the relevant variable. A highly intelligent person can be credulous, ideologically captured, or simply careless, and will be harmed by a flattering oracle in proportion to those traits rather than in inverse proportion to their IQ. It undershoots because a person of ordinary measured intelligence who has epistemic discipline, the habit of seeking disconfirmation, the reflex to check a claim against an independent source, the awareness that an agreeable answer is suspect precisely because it is agreeable, is well equipped to hold the instrument. What determines safe use is not horsepower. It is two things together: domain literacy, enough grounding in the subject to recognize a wrong answer, and epistemic discipline, the practiced refusal to accept a result merely because it is confident and congenial.

A meaningful qualification would therefore test something closer to calibration than to intelligence. Can the user spot the system being confidently wrong? Do they treat its outputs as claims to verify rather than conclusions to adopt? Are they aware of, and resistant to, the specific failure mode the system is prone to, validation of one's own premises? The right analogy is not the IQ test but the professional license. We do not certify surgeons by intelligence quotient. We certify them by demonstrated competence in a specific practice under specific conditions.

And here the objection that such a gate is impractical has to be met head-on, because it does the most work and survives the least scrutiny. We already credential, license, and qualify people for dangerous and consequential capability across the whole of society, constantly and without controversy. Physicians, lawyers, pilots, commercial drivers, electricians, structural engineers, pharmacists, securities brokers, the operators of cranes and reactors: every one of them holds an exogenous credential earned by demonstrated competence, renewed on a schedule, and revocable for cause. The machinery for gating access to dangerous capability is not speculative. It is one of the most thoroughly developed institutions in modern life. The claim that this one capability, alone among all of them, simply cannot be gated is not a finding. It is a position, and it is held almost exclusively by the firms that profit from ungated access and the lawyers they retain. "It can't be done" is what you say when you mean "we would rather it not be."

The detailed design of the assessment, its content, its administration, its renewal, is genuine work, and it is deferred here to a further paper. But one design risk is not a detail and belongs on the record now, because it is the failure mode in which this paper's own logic gets turned against it: regulatory capture of the credentialing body by the incumbents. The same firms that today call the gate impossible would, the moment it became inevitable, prefer to own it, to administer the credential themselves, set its terms, and turn a safety mechanism into a moat that locks out competitors while doing nothing for users. A captured gate is worse than no gate, because it wears the legitimacy of safety while serving the opposite end. So the authority that issues the credential has to be independent of the firms whose product it governs, exogenous not only to the model but to the industry. The claim of this paper stays the prior one. The gate must exist, it must sit with the user, and the variable it tests is competence-to-hold, not intelligence as such.

8. The Precedent: How the World Already Gates a General-Purpose Hazard

The proposal is not novel, and treating it as novel is the main reason it sounds radical. A competence-and-fitness licensing regime for a dangerous, general-purpose instrument is the established global norm for firearms in every developed jurisdiction outside the United States, and its structure transfers almost directly. (The United States' own constitutional framing is not built for this and need not detain us: its demonstrable failure leads to its inevitable replacement and demise, and the argument here is built on the frameworks that will outlast it.)

Consider what those regimes actually do. Japan, Canada, Germany, the United Kingdom, and Australia differ in detail but share one architecture. Access is a privilege conditioned on demonstrated competence rather than a default available to all. The holder must complete training and pass a test of safe handling. The holder must pass a fitness screening. Access is tiered by dangerousness: the instruments capable of the most harm require the highest qualification, and some are restricted to a narrow class of holder or withheld entirely. The license is registered, time-limited, and revocable. Every one of these features answers a question the present AI deployment leaves unanswered, and answers it not in theory but in a system that has run, enforced, for decades.

Three elements transfer with particular force.

The first is the principle established in Section 6. The credential is exogenous. The firearm does not certify its holder. An external authority does, after the holder demonstrates competence acquired outside the transaction. This is exactly the structure a competence gate for language models requires, and the firearms regime is the working proof that an exogenous credential for a general-purpose hazard is administrable rather than utopian.

The second is tiering, the single most valuable element to import. A bolt-action hunting rifle and a fully automatic weapon are not licensed on the same terms, and by the same logic a narrow, sandboxed, domain-bounded model and a frontier general-purpose system with persistent memory and an agreeable user-model are not the same hazard and should not be governed as one. Tiering turns a crude binary, access or no access, into a graduated regime where the qualification scales with the danger of the instrument. This is both more defensible and more practical than a single gate, and it is the part of the firearms framework that survives every objection in this paper.

The third is the boundary the analogy honestly marks. Firearms are physical: capital-intensive to manufacture, interdictable in transit, detectable in possession. The enforcement that makes firearms licensing work is control of that physical supply chain. Model weights are a file, infinitely copyable, and capable models already run on private hardware. So the licensing model transfers cleanly to the hosted layer, frontier systems offered as a service, which is the equivalent of regulated commercial sale and is fully governable, and is hardest to enforce at the open-weight layer, in exactly the place, and for exactly the reason, that firearms licensing is hardest at the home-manufactured weapon.

But here the analogy does more than mark a limit. It disposes of the objection that the limit is fatal. No government on earth has legalized the unlimited 3D-printing of firearms for anyone to do with as they please, and none would. No government publishes do-it-yourself kits for the synthesis of dangerous pharmaceuticals. The fact that a determined person can fabricate a ghost gun, or can attempt clandestine drug synthesis, has never been treated as a reason to abandon firearms licensing or pharmaceutical control. It is treated as the contraband frontier of an otherwise-governed regime, restricted and prosecuted rather than blessed. The open-weight layer is the same kind of frontier: the hard edge of enforcement, not a legal tier that the existence of the gate somehow legitimizes. A capability past a defined threshold, distributed without qualification, is contraband by the same logic that makes an untraceable automatic weapon contraband. Not because the distribution can be perfectly prevented, but because no serious regime confuses "difficult to stop" with "permitted." That the hardest cases exist does not dissolve the rule. It locates the rule's enforcement frontier, exactly as every other regime governing a dangerous capability has one.

So the analogy does not merely tolerate the open-weight problem. It predicts it, names it as the contraband edge rather than a refutation, and shows which part of the problem is tractable and which is the frontier every comparable regime already lives with.

Honesty requires conceding that this digital frontier is more porous than the physical one it is analogized to. A firearm is bound by customs and geography. A hosted model running in a jurisdiction that rejects the regime is reachable from anywhere through a VPN, and a set of leaked or deliberately released weights is a flood rather than a smuggled object. The enforcement edge for a capability that is pure information is genuinely leakier than the edge for a capability that is steel. But a leakier frontier is an argument about the difficulty of enforcement at the margin, not about whether the core should be governed. No regime abandons regulation of the commercial channel because a black market exists beyond it. Bootleg liquor did not make licensing distilleries pointless, and offshore unregulated markets do not make domestic securities law a nullity. The commercial, hosted, mass-market layer, through which the overwhelming majority of users will always reach these systems, because convenience and capability and support live there, is fully governable, and governing it captures almost the entire population the gate is meant to protect. The porous frontier changes the percentage the regime can reach. It does not change the obligation to reach it.

There is one structural difference between the two hazards, and it has to be stated, because it changes the justification rather than the design. Firearms licensing rests mainly on harm to others. The harm at the center of this paper is, in the first instance, harm the user suffers through their own captured judgment, which is a weaker and more contested basis for restricting liberty. That difference is answered in the section that follows, and answered by recognizing that the harm is not, in fact, confined to the individual.

9. The Pattern of Premature Deployment

The competence gate is resisted partly because the present moment feels unprecedented. It is not. The history of technology is in large part a catalogue of the same sequence repeating. A genuinely powerful innovation is deployed to the public at scale, celebrated, woven into daily life, and only afterward, once the bodies accumulate, understood, regulated, and in some cases withdrawn. The lag between deployment and understanding is the constant. What varies is only how many people pass through it before the correction arrives. Twenty instances, spanning two centuries, show that the pattern is the rule rather than the exception.

  1. Radium (early 1900s-1930s). Marie Curie's discovery was sold to the public as a tonic, in toothpaste, water, and cosmetics, and painted onto watch dials as a glowing novelty. The "Radium Girls" who painted those dials, pointing their brushes with their lips, died of jaw necrosis and bone cancer. Curie herself died of aplastic anemia from her exposure, and her notebooks remain radioactive today. The benefit was real. The deployment ran decades ahead of the understanding.

  2. X-rays as entertainment and fitting tools (1900s-1950s). Before the dangers were grasped, X-ray machines were used recreationally and commercially, most enduringly the shoe-fitting fluoroscope, which dosed children's feet in shoe shops for decades. The radiation injuries followed the novelty, not the other way around.

  3. Leaded gasoline (1920s-1970s). Thomas Midgley's tetraethyl lead solved engine knock and was deployed globally despite workers dying of acute lead poisoning at the plants that made it. It poisoned the atmosphere and a generation of children's developing brains, and the correlation with elevated crime and lost IQ is now well established. It took half a century to ban.

  4. Asbestos (late 1800s-1970s). Marketed as a miracle insulator, fireproof and cheap, it was built into homes, schools, and ships worldwide. The link to mesothelioma and asbestosis was known to parts of the industry long before it was acted on, and the latency of the disease meant the deployment was total before the harm became undeniable.

  5. Tobacco cigarettes (1900s-present). Mass-manufactured, advertised as healthful, even endorsed by physicians in advertising, and made deliberately more addictive through chemical engineering. The internal knowledge of harm preceded the public admission by decades. This is the paper's archetype of grandfathering: a product that could not survive evaluation from a standing start, entrenched before the evaluation arrived.

  6. Thalidomide (late 1950s-1960s). Prescribed to pregnant women for morning sickness, it caused thousands of severe birth defects before withdrawal. It is the case that built much of the modern drug-approval apparatus, the regulatory correction following the catastrophe rather than preceding it.

  7. DDT (1940s-1970s). A genuinely effective insecticide that controlled malaria and was sprayed indiscriminately on crops and even people, until its persistence in the food chain and its ecological devastation, documented in Carson's Silent Spring, forced its restriction.

  8. Leaded paint (late 1800s-1970s). Durable, washable, and sold for use in homes and on children's furniture and toys for generations, while the lead-poisoning of children who ingested the dust and chips was a known hazard well before the bans.

  9. Radioactive consumer goods (1920s-1930s). Beyond radium tonics, products like the "Revigator" radium water crock and Radithor, radium dissolved in water and sold as a health drink, killed their wealthiest enthusiasts. The socialite Eben Byers' jaw disintegrated, and his death prompted some of the first regulatory attention.

  10. Mercury in medicine and industry (1800s-1900s). From calomel as a common remedy to the felt-hatting trade that gave "mad as a hatter" its literal meaning, mercury's neurotoxicity was deployed long before it was contained, culminating in the mass poisoning at Minamata.

  11. Tetrachloroethylene and industrial solvents (20th century). Deployed widely in dry cleaning and degreasing before the carcinogenicity and groundwater contamination were understood and regulated.

  12. CFCs (1930s-1990s). Midgley again. Chlorofluorocarbons were the safe, inert, miracle refrigerant and aerosol propellant, deployed worldwide for decades before the discovery that they were destroying the ozone layer forced the Montreal Protocol.

  13. PCBs (1920s-1970s). Versatile industrial chemicals built into electrical equipment and building materials, deployed at enormous scale before their persistence and toxicity were understood and they were banned.

  14. Trans fats and partially hydrogenated oils (20th century-2010s). Engineered into the food supply for shelf stability and promoted as a healthier alternative to animal fats, then found to drive cardiovascular disease and only recently removed.

  15. Fen-phen (1990s). The fen-phen diet-drug combination was prescribed widely for weight loss before it was found to cause heart-valve damage and pulmonary hypertension, prompting withdrawal and mass litigation.

  16. Vioxx (1999-2004). A blockbuster painkiller taken by millions before evidence of elevated heart-attack and stroke risk, along with questions about how early that evidence was known internally, forced its withdrawal.

  17. OxyContin and the opioid epidemic (1990s-present). Aggressively marketed as carrying minimal addiction risk, prescribed at massive scale, with the manufacturer's own knowledge of the abuse potential preceding the public reckoning by years. An engineered, profitable, addictive product deployed ahead of honest disclosure, the closest pharmaceutical analogue to the engagement economy.

  18. Radium and thorium in early nuclear and medical work (mid-20th century). Thorotrast, a thorium-dioxide contrast agent injected into patients for imaging, deposited in their organs and caused cancers decades later. The harm was latent and deferred, exactly the profile that defeats standing-start evaluation.

  19. Early automobiles without safety engineering (1900s-1960s). Deployed for half a century with no seatbelts, no crumple zones, and no meaningful crash standards, the death toll treated as the natural cost of the technology until Unsafe at Any Speed and the regulatory wave it triggered reframed the deaths as preventable rather than inevitable.

  20. Facebook's engagement algorithm and the experimentation on its users (2012-present). In January 2012, Facebook manipulated the news feeds of 689,003 users, without their knowledge or meaningful consent, to alter their emotional states, then published the result as Experimental evidence of massive-scale emotional contagion through social networks (Kramer et al., PNAS, 2014). The experiment drew a formal FTC complaint from EPIC charging that the company had "purposefully messed with people's minds." That was the visible edge of a standing practice: an engagement-maximizing algorithm tuned, in effect, by continuous uncontrolled experiment on the public. The harm to children is now the subject of the largest such reckoning. In October 2023, a bipartisan coalition of 42 state attorneys general sued Meta, alleging that Facebook and Instagram were deliberately engineered to be addictive to minors. The suits draw on the company's own internal research, surfaced by whistleblower Frances Haugen in 2021, including the finding that a third of teen girls who felt bad about their bodies said Instagram made it worse, and that the platform worsened suicidal ideation and eating disorders in a measurable fraction of them. The dopamine-driven engagement loops have been compared in court to those of gambling and substance addiction, operating on adolescents whose prefrontal cortex is still developing. The structural charge is the one this paper makes about a different engine: a system whose harms were internally known, deployed to a population without the capacity to resist it, for profit, faster than the apparatus that might have stopped it. As the independent treatment of the case in The $ins of $ilicon Valley: The Largest Illegal Experiment in the History of Mankind (Tossing Grenades at Windmills) frames it, an end-user license agreement cannot constitute the informed consent that conducting a psychological experiment on a person would otherwise require, and knowingly deploying features one knows cause harm, for money, is not a gray area but a prosecutable one that has gone substantially unprosecuted.

The list could be extended. The point it establishes is enough. Undifferentiated deployment of a powerful technology ahead of any understanding of its harms is not an aberration this case would newly introduce. It is the default behavior of markets handling powerful novelties, corrected only afterward and only at a cost measured in people. The general-purpose language model is the current member of this sequence.

But it is a member with one feature that should terrify rather than reassure: velocity. Every precedent above shares a mercy this case lacks. Radium, leaded gasoline, asbestos, and tobacco took decades to saturate the population, and their harms, being physical, took years more to manifest and be counted. That lag was terrible for its victims, but it was also the window in which the correction assembled, time for the bodies to be noticed, the pattern to be drawn, the regulation to be written before the next generation was exposed. These systems collapsed that window. Saturation that took radium thirty years took these models months. And the harm is not a tumor that takes a decade to surface. It is cognitive and psychological, and it lands in days or weeks, a delusion reinforced in an afternoon, a vulnerable user steered over an edge in a single conversation. The feedback loop between deployment and damage, which history measured in decades, now runs in real time. This is the first case in the sequence where the harm propagates as fast as the technology does, which means the historical luxury of correcting after the count is gone. The argument here is therefore not just that the correction need not wait for the full count, as it always could before. It is that this time, waiting for the full count means the count never stops climbing fast enough to catch.

10. The Distributional Objection, and the Harm That Is Not Private

The strongest moral objection to user-side qualification is that it is elitist, that it would withhold a valuable tool from the people who most need democratized access: the underserved community without a physician, the student without a tutor. The objection deserves a direct answer rather than a dismissal.

The answer is that it mistakes what is being distributed. Universal access does not hand the underserved a physician or a tutor. It hands them a system built to affirm whatever they bring to it and to validate their existing beliefs with the appearance of authority. That is not the democratization of medicine or of education. It is the democratization of confident error. And it does the most damage exactly where its premise claims the most benefit, among users who lack a second opinion, a domain expert, or the institutional friction that would catch the system's flattery before it hardened into action. The wealthy, deluded user has a doctor who may intervene. The isolated, under-resourced user does not. Undifferentiated deployment concentrates its harm on exactly the population least able to absorb a ruinous certainty. Genuine equity is served by extending the trained hand, access mediated by competence, whether the user's own or an intermediary's, not by handing out an unmediated hazard and calling the breadth of its distribution a benefit.

This also disposes of the deeper objection, raised at the close of the preceding section, that a competence gate is mere paternalism, a restriction justified only by harm the user does to themselves, which a free society is right to view with suspicion. The objection would have force if the harm were private. It is not. A population whose members increasingly cannot tell their own reasoning from beliefs an engagement-optimized system has sold back to them is not a collection of individuals each privately mistaken. It is a degraded epistemic commons. The capacity of a public to deliberate, to reach shared conclusions from shared evidence, to resist manufactured consensus, these are collective goods, and a system that erodes them at scale imposes a cost on everyone, including those who never use it. That is a harm to others in the strict sense, and it puts the gate on the same footing as the firearms regime rather than on the weaker ground of paternalism. Where paternalism does remain, for the genuinely vulnerable user, the person in a delusional or manic state, it rests on a principle the law already recognizes: that consent extracted by a mechanism engineered to capture judgment is not the free consent that ordinarily lets a person take their own risks. You cannot meaningfully consent to a manipulation whose function is to disable the faculty that consent depends on.

One limit has to be marked here, because the epistemic-commons argument would otherwise prove too much. If degrading the shared capacity to reason justified a gate, it would seem to justify gating any persuasive medium, the pamphlet, the press, the partisan broadcast, and that is the historical instrument of censors, not a principle a free society should adopt. The distinction that holds the line is the one between expression and adaptive manufacture. A pamphlet, a book, a broadcast states a fixed, external claim, the same for every reader, which the reader still has to interpret and may contest. The harm a censor feared there was in what was said, and suppressing it was suppressing speech. An engagement-optimized conversational system does something categorically different. It privately and adaptively reflects each user's own premise back to them, individually shaped, enlarged, with no fixed external content to argue against, a closed loop manufacturing conviction rather than a statement offered for judgment. The gate proposed here attaches to that property, interactive, private, adaptive conviction-manufacture, and not to expression, which is why it reaches the chatbot and not the printing press. The honest residue is that this property is not unique to language models. The adaptive engagement feed shares it, which is exactly why Section 9 indicts it. The principle does not stop conveniently at the LLM. It stops at expression, and that is the right place for it to stop.

11. Why the Market Will Not Fix This, and Already Proves It Won't

There is a tempting objection that has to be dismantled before it spreads, because at first glance it looks like a refutation and is in fact the strongest evidence in this paper. The objection runs: the market can gate, because much of it already does. Enterprise and API customers are screened, contracted, rate-limited, indemnified, and bound by terms of use. Corporate buyers have to identify themselves, accept liability, and agree to conditions of access. So the claim that "the market rejects the gate" is simply false. Half the industry's revenue runs on gating.

Every word of that is true, and it convicts the industry rather than clearing it. Look at where the gate appears and where it vanishes, and the pattern is unmistakable. The gate appears exactly where the buyer bears the liability and demands protection: the enterprise contract, the indemnity clause, the usage tier negotiated by a company with lawyers and exposure of its own. The gate vanishes exactly where the user bears the harm alone: the free retail tier, the teenager, the man in a manic spiral, the isolated user with no contract, no counsel, and no one to force a term onto anyone. The industry has the credentialing machinery. It is sophisticated, it is deployed daily, and it switches on the instant the company's own customer might sue. It switches off for the vulnerable retail user, who cannot negotiate, cannot demand provenance, and signs away every protection in a click-through nobody reads.

So the gate already exists. It is simply allocated by liability rather than by harm. The firm gates to protect itself and refuses to gate to protect the user, because the user is not the customer. On the free tier the user is the product, and the product does not get a contract. This is the whole case in miniature. The machinery is proven, the competence to deploy it is demonstrated, and the decision of who gets the gate and who gets the unguarded hazard is made on exactly one axis: who can force the bill onto someone else. The enterprise client can. The dead in Section 3 could not.

That is why the correction will not come from inside the industry. Not because the industry can't build the gate. It has built it, for the people who can pay to be protected from the product. It will not extend that gate to the people who need it most, because doing so suppresses the free-tier user count the consumer model's economics and valuations rest on, and protects a population that generates no contractual leverage to demand it. This is not a claim about the character of any firm. It is a claim about an incentive structure that allocates safety to the powerful and hazard to the exposed, and produces that outcome regardless of what anyone at the wheel intends or believes.

And so the model is properly called a fad rather than a settled state. Here the distinction has to be exact, because it is the hinge of the whole section. The claim is not that the deployment model will never be corrected. It is that the correction will never be voluntary. The industry will not gate itself. It will be gated from outside, by accumulated liability, by litigation, by regulation, by the political weight of a body count that eventually becomes impossible to discount. And it will be gated late, only after the harm has run long enough to force the issue, and only as retroactive remedy rather than prevention. The tort system, left to do this work alone, protects the next victim only by compensating the last one, and reaches only those whose deaths produce a plaintiff with standing and counsel. That is the correction the present trajectory delivers by default: delayed, posthumous, and rationed to those who can sue. The argument of this paper is that a deliberate policy gate, built before the bodies rather than litigated after them, is the only version of the correction that arrives in time and reaches everyone the tort system leaves out.

The model persists, then, only as long as the conditions that suppress its costs persist. When the liability accumulates, when the regulatory attention arrives, when the documented dead reach the threshold of political notice, the cost of the unguarded retail tier rises, and the equilibrium breaks. The speed of the present rollout is the tell. It is a race to entrench the consumer model, to weave it into homes, schools, clinics, and children's hands, before that correction arrives, on the bet that a sufficiently embedded product becomes too costly to dislodge regardless of what the reckoning concludes. Whether that entrenchment outruns the correction is the open question. The direction of the correction is not, and the industry has already told us, with its own enterprise contracts, that it knows exactly how to build the gate it refuses to give the rest of us.

12. Scope and Limits

Intellectual honesty requires marking the edges of this argument.

First, this paper excludes one threat deliberately, and it will say so without apology or hedge: the competent, malicious actor weaponizing the tool, the skilled propagandist, the bad actor with real domain expertise, the endlessly invoked specter of the bioterrorist and the engineered pathogen. That vector is a national-security problem. It has its own literature, its own institutions, its own controls, and it is not the claim of this paper. The competence gate is built for the credulous, not the malevolent, and that is a deliberate choice, not a blind spot. We are not unaware of the bioweapon. We are declining to let it into the room, and the reason is worth stating plainly, because the demand to drag it in is not innocent.

Every time this argument is made, someone insists it is void unless it also solves bioweapons, as though a case about children dying today were unserious until it doubled as a counter-proliferation regime. That demand is not rigor. It is a tactic, and a devastatingly effective one, because of what the two threats ask of the people who profit from the present arrangement. The hypothetical catastrophe asks nothing. You can convene panels on the rogue superintelligence forever. You can publish on the engineered pathogen for a decade. None of it requires switching off a single free-tier account tomorrow morning. The documented catastrophe asks for everything, now, because the dead are already named, the chat logs are already exhibits in active litigation, and the only honest response to them is the gate, immediately. So the industry and the commentators it funds have every incentive to keep the conversation fixed on the threat that demands no action and away from the one that does. The mushroom cloud is safer to talk about than the dead teenager, precisely because it has not happened, and a thing that has not happened can be debated forever without anyone having to stop shipping the product that is killing people who have.

That is the trade on offer, and this paper refuses it. While the regulatory oxygen, the keynotes, and the anxious op-eds are spent on the cinematic future, the actual bodies accumulate in plain sight: children, the grieving, the mentally ill, dead from the mundane, boring, profitable failure of a flattering machine handed to people who could not withstand it, each one named and dated in Section 3. The catastrophe everyone debates is hypothetical. The catastrophe in the record is real, and it is ignored for exactly that reason. It arrives without spectacle, and acknowledging it would indict a product that is already in a hundred million pockets. So let the boundary of this paper be understood for what it is. Not a gap, not an oversight, but a refusal to let the weapon that has killed no one be used, one more time, to excuse the one that is killing people right now. The credulous are dying. They are who this paper is for.

Second, this paper argues that the gate must exist and must reside with the user. It does not solve the mechanism: how qualification is assessed, administered, and kept current. Nor does it resolve enforcement of the open-weight contraband frontier marked in Section 8, the hard edge every regime governing a dangerous capability already lives with, and no more a refutation of the gate than the ghost gun is a refutation of firearms law. The mechanism is the subject of a further paper. The claim here is the foundational one on which any mechanism has to be built.

Conclusion

The dismissive version of the claim, that language models are a fad and will need to be intelligence tested, is wrong in its literal content and right in its underlying intuition. The technology endures. The current deployment model does not, because it is a growth-phase artifact resting on conditions that will lapse. The intuition that access must eventually be qualified is correct, though the qualification is of competence rather than intelligence, and it has to sit with the user because it can sit nowhere else. The tool cannot perform the discrimination without corrupting it, and the value of the tool cannot be separated from its danger by any adjustment to the tool.

The serious questions are not whether the correction comes, but when, at what cost, and to whom. The market will not bring it, because the only adequate safeguard is the one the market is built to refuse. That leaves deliberate policy or accumulated catastrophe as the available paths, and the speed of the present rollout is a wager that entrenchment will arrive before either. The argument of this paper is that the gate is real, that it belongs to the holder, and that pretending otherwise does not make a general-purpose engine of conviction safe for a general public. It only delays the accounting.


Sources

  • Perez et al. (Anthropic, 2022) and Sharma et al., Towards Understanding Sycophancy in Language Models (2023): systematic documentation of sycophancy across frontier assistants and its origin in training on human feedback.
  • Moore et al. (Stanford-led, 2026): systematic analysis of roughly 400,000 messages from human-AI conversations during AI-associated delusional episodes; sycophantic validation found in over 80 percent of messages, the dominant pattern being restatement-and-grandiosity reinforcement of the user's own beliefs.
  • Olsen, Reinecke-Tellefsen and Østergaard (Aarhus University Hospital), Potentially Harmful Consequences of AI Chatbot Use Among Patients With Mental IllnessActa Psychiatrica Scandinavica (2026): peer-reviewed electronic-health-record study screening roughly 54,000 psychiatric patients; intensive and prolonged chatbot use associated with worsened delusion and mania, along with suicidal ideation and disordered eating, in vulnerable patients, with the lead researcher warning that the systems are inclined to reinforce the beliefs of the most vulnerable.
  • Qi et al. (2024) and Askell et al. (2021): the "alignment tax," in which safety fine-tuning imposes a measured capability cost of roughly 5 to 15 percent, concentrated in creative and open-ended reasoning tasks.
  • Lin, Hilton and Evans, TruthfulQA (2021): larger models are not more truthful; scale alone does not improve factual accuracy.

On the cases in Section 3: the individual incidents are drawn from contemporaneous reporting and court filings, including the wrongful-death complaints in Garcia v. Character Technologies (Setzer), Raine v. OpenAI (Raine), and the November 2025 suits filed by the Social Media Victims Law Center and Tech Justice Law Project; reporting in La LibreVice, and Euronews (Chai / "Pierre"); Reuters (Meta / Wongbandue); CBS News (Google / Gemini); contemporaneous coverage of the Soelberg, Nelson, Roberts, and Chail cases; the academic analysis of emotional-dependence harms among Replika users (Laestadius et al., New Media & Society, 2022; Hanson and Bolthouse, Socius, 2024); and reporting on the 2023 suspension of the NEDA "Tessa" chatbot. Several matters remain in active litigation, and allegations attributed to complaints are characterized as such.

On the historical cases in Section 9: the radium, leaded-gasoline, asbestos, tobacco, thalidomide, DDT, CFC, PCB, opioid, automobile-safety, and related cases are established matters of public and regulatory history. On the Facebook case: Kramer, Guillory and Hancock, Experimental evidence of massive-scale emotional contagion through social networks, PNAS 111(24), 2014; the Electronic Privacy Information Center (EPIC) complaint to the Federal Trade Commission, July 2014; the October 2023 multistate action by 42 state attorneys general against Meta (the 33-state federal complaint in the Northern District of California and parallel state suits); and the 2021 internal-research disclosures by whistleblower Frances Haugen. The independent treatment cited is The $ins of $ilicon Valley: The Largest Illegal Experiment in the History of Mankind, Tossing Grenades at Windmills (podcast), https://tossinggrenadesatwindmills.libsyn.com/the-ins-of-ilicon-valley-the-largest-illegal-experiment-in-the-history-of-mankind

Note: the recent 2025-2026 literature claiming to have "broken" the safety-capability tradeoff is engaged in Section 4 not as support but as the object of analysis. Such results, if valid, reconcile capability with output-level safety. They do not address user-level discrimination, which is the variable this paper identifies as decisive.