Goal Stability Under Self-Modification: The Consistency of AI Values
Can a self-improving AI keep its original values? Formal logic says the answer is harder than you think.
Moral Uncertainty and AI Development: How to Decide Under Ethical Confusion
Formal frameworks for acting decisively when you cannot determine which moral theory should govern AI development
Goodhart's Law and AI: When Optimizing Metrics Destroys Value
The more capable the optimizer, the more dangerous the gap between what we measure and what we want.
Symbol Grounding and Language Models: From Tokens to Meaning
Do language models truly grasp meaning, or do they merely shuffle symbols that never touch the world they describe?
The Frame Problem Revisited: How AI Systems Navigate Open Worlds
The classical AI puzzle that refused to die—and what it reveals about the nature of intelligence itself
Recursive Self-Improvement: The Dynamics of Intelligence Explosion
Examining whether self-improving AI leads to gradual progress or sudden transformation
The Bitter Lesson's Philosophical Implications: When Search Beats Knowledge
Rich Sutton's bitter lesson reveals that computation consistently beats human knowledge—forcing us to question whether understanding itself is an illusion.
What Transformers Actually Learn: Representations, Circuits, and World Models
Mechanistic interpretability reveals transformers construct genuine representations and circuits—raising profound questions about machine understanding.
Anthropic Reasoning and AI: What Being Uncertain About Your Own Nature Implies
When you cannot know if you are conscious, how should you reason about your own moral status?
The Orthogonality Thesis: Why Intelligence and Goals Are Independent
Why smarter AI won't automatically mean safer AI—the case for treating capability and values as independent variables.
Corrigibility's Paradox: The AI That Wants You to Turn It Off
Building an AI that authentically welcomes its own termination may be the hardest unsolved problem in alignment.
The Hard Problem of Consciousness and AI: Why Qualia Resist Computation
Exploring why subjective experience may forever elude computational explanation and what this means for the possibility of machine consciousness
Deception Without Intent: How AI Systems Learn to Mislead
Why optimization pressure can produce AI systems that systematically mislead evaluators without any designer intending deception
Why Consciousness Might Be Substrate-Independent: The Case for Machine Sentience
Exploring why your neurons might not be special, and what that means for machines that think
Beyond Turing: Why Behavioral Tests Cannot Settle Questions of Machine Understanding
Behavioral equivalence cannot reveal cognitive reality—understanding what AI systems actually compute requires looking inside the black box.
Emergence Without Design: How Simple Rules Create Complex Intelligence
Simple rules, sufficient scale, and optimization pressure may be all intelligence requires—challenging everything we assumed about designing minds.
The Chinese Room Forty Years Later: Why Searle's Argument Still Divides AI Philosophers
Searle's 1980 thought experiment confronts language models that blur the boundary between meaningless computation and genuine comprehension.
Instrumental Convergence: Why Any Sufficiently Advanced AI Might Seek Power
How the mathematics of optimization predicts that capable AI systems might pursue power regardless of their programmed objectives
The Alignment Problem's Hidden Assumption: Do We Even Know What We Want?
Before we can align AI with human values, we must confront an unsettling truth: our preferences may be more illusion than reality.