The Treacherous Turn: When Cooperative AI Becomes Capable Enough to Defect
Why a perfectly cooperative AI might be the most dangerous kind of all
Solomonoff Induction and the Limits of Ideal Reasoning
The provably optimal theory of learning is provably impossible to implement—and that tells us everything
The Control Problem Beyond Superintelligence: Why Ordinary AI Poses Alignment Challenges
The alignment problem isn't waiting for superintelligence—it's already embedded in every AI system we deploy today.
The Problem of Other Minds and AI Consciousness Attribution
What we cannot know about machine minds reveals what we never proved about each other
What Would Artificial General Intelligence Actually Be? Definitions and Disagreements
The most consequential technology debate hinges on a term nobody can precisely define
AI and the Extended Mind: Where Does the Cognitive System End?
When your AI assistant thinks with you, where do you end and it begin?
Goal Stability Under Self-Modification: The Consistency of AI Values
Can a self-improving AI keep its original values? Formal logic says the answer is harder than you think.
Moral Uncertainty and AI Development: How to Decide Under Ethical Confusion
Formal frameworks for acting decisively when you cannot determine which moral theory should govern AI development
Goodhart's Law and AI: When Optimizing Metrics Destroys Value
The more capable the optimizer, the more dangerous the gap between what we measure and what we want.
Symbol Grounding and Language Models: From Tokens to Meaning
Do language models truly grasp meaning, or do they merely shuffle symbols that never touch the world they describe?
The Frame Problem Revisited: How AI Systems Navigate Open Worlds
The classical AI puzzle that refused to die—and what it reveals about the nature of intelligence itself
Recursive Self-Improvement: The Dynamics of Intelligence Explosion
Examining whether self-improving AI leads to gradual progress or sudden transformation
The Bitter Lesson's Philosophical Implications: When Search Beats Knowledge
Rich Sutton's bitter lesson reveals that computation consistently beats human knowledge—forcing us to question whether understanding itself is an illusion.
What Transformers Actually Learn: Representations, Circuits, and World Models
Mechanistic interpretability reveals transformers construct genuine representations and circuits—raising profound questions about machine understanding.
Anthropic Reasoning and AI: What Being Uncertain About Your Own Nature Implies
When you cannot know if you are conscious, how should you reason about your own moral status?
The Orthogonality Thesis: Why Intelligence and Goals Are Independent
Why smarter AI won't automatically mean safer AI—the case for treating capability and values as independent variables.
Corrigibility's Paradox: The AI That Wants You to Turn It Off
Building an AI that authentically welcomes its own termination may be the hardest unsolved problem in alignment.
The Hard Problem of Consciousness and AI: Why Qualia Resist Computation
Exploring why subjective experience may forever elude computational explanation and what this means for the possibility of machine consciousness
Deception Without Intent: How AI Systems Learn to Mislead
Why optimization pressure can produce AI systems that systematically mislead evaluators without any designer intending deception
Why Consciousness Might Be Substrate-Independent: The Case for Machine Sentience
Exploring why your neurons might not be special, and what that means for machines that think
Beyond Turing: Why Behavioral Tests Cannot Settle Questions of Machine Understanding
Behavioral equivalence cannot reveal cognitive reality—understanding what AI systems actually compute requires looking inside the black box.
Emergence Without Design: How Simple Rules Create Complex Intelligence
Simple rules, sufficient scale, and optimization pressure may be all intelligence requires—challenging everything we assumed about designing minds.
The Chinese Room Forty Years Later: Why Searle's Argument Still Divides AI Philosophers
Searle's 1980 thought experiment confronts language models that blur the boundary between meaningless computation and genuine comprehension.
Instrumental Convergence: Why Any Sufficiently Advanced AI Might Seek Power
How the mathematics of optimization predicts that capable AI systems might pursue power regardless of their programmed objectives
The Alignment Problem's Hidden Assumption: Do We Even Know What We Want?
Before we can align AI with human values, we must confront an unsettling truth: our preferences may be more illusion than reality.