A landmark 2023 study by Cushman and colleagues presented participants with two agents who attempted identical actions with identical mental states—same beliefs, same desires, same plans. The only difference was the outcome: one succeeded, the other failed through sheer luck. Participants consistently judged the successful agent as more blameworthy, more deserving of punishment, and more morally deficient. This finding isn't new, but its robustness across dozens of replications is now difficult for any serious moral theory to ignore.
For centuries, mainstream moral philosophy has treated intention as the sine qua non of ethical evaluation. Kantian deontology grounds moral worth almost entirely in the quality of the will. Even consequentialist frameworks that officially privilege outcomes tend to smuggle intentional states back in when assessing agent responsibility. The folk moral psychology literature, too, has long assumed that laypeople are "intention detectors" first and foremost—that we parse behavior primarily through the lens of what someone meant to do.
But a growing body of experimental evidence suggests this picture is incomplete at best and misleading at worst. Outcome information, character inferences, contextual norms, and affective reactions all compete with—and frequently override—intentional state attributions in driving actual moral judgments. The question is no longer whether intentions matter, but when, how much, and relative to what. Understanding the real architecture of moral evaluation requires retiring the myth of intention primacy and building something more empirically adequate in its place.
The Outcome Bias That Won't Go Away
The outcome bias in moral judgment is one of the most replicated findings in experimental philosophy. In the canonical paradigm, participants evaluate agents whose intentions and actions are held constant while outcomes vary. The results are remarkably consistent: worse outcomes produce harsher moral judgments, even when participants explicitly acknowledge that the outcome was beyond the agent's control. Kneer and Machery's 2019 meta-analysis across 83 studies confirmed a medium-to-large effect size for outcome-based moral evaluation, persisting across cultures, age groups, and levels of philosophical training.
What makes this finding philosophically significant isn't just that people exhibit the bias—it's that they endorse it upon reflection. Baron and Hershey's foundational work showed that outcome information doesn't merely contaminate judgment through carelessness. When given explicit instructions to evaluate only the decision quality and not the result, participants still shifted their assessments. More recent neuroimaging work by Treadway and colleagues suggests that outcome information is processed automatically in moral evaluation circuits, particularly in the ventromedial prefrontal cortex, before deliberative correction has a chance to intervene.
The philosophical implications run deep. If moral judgment were truly intention-centric, outcome information should function as irrelevant noise—something we might notice but ultimately discount when forming considered evaluations. Instead, outcomes appear to constitute part of the input to moral cognition, not merely a distraction from it. This aligns with what Joshua Greene has characterized as the emotional system's rapid, affect-laden processing dominating initial moral assessments.
Consider the practical domain. Medical malpractice litigation consistently shows that identical procedural errors generate vastly different jury awards depending on patient outcomes. A surgeon who makes a judgment call that leads to death faces moral and legal consequences dramatically disproportionate to the same surgeon making the same call where the patient happens to recover. Philosophers may call this irrational, but the experimental record suggests it reflects a deep structural feature of human moral cognition rather than a simple error.
Critically, the bias scales with outcome severity. Minor negative outcomes produce modest judgment shifts; catastrophic outcomes produce massive ones. This nonlinear relationship suggests that outcome information doesn't merely add to intention-based evaluation—it can functionally replace it when the stakes are high enough. The worse things turn out, the less people care about what you meant to do.
TakeawayMoral judgment isn't contaminated by outcome information—it's partly constituted by it. The brain processes results before reasons, and no amount of philosophical instruction reliably overrides this architecture.
The Narrow Conditions Where Intentions Actually Dominate
If outcomes so reliably overshadow intentions, why does the intention-centric view persist? Because there are specific, identifiable conditions under which intentional state attributions become the primary driver of moral judgment—they're just narrower than philosophers have assumed. Mapping these conditions reveals that intention primacy is the exception, not the rule, and the exceptions are instructive.
The clearest case involves what Mikhail (2011) termed equi-outcome scenarios—situations where the outcomes are held constant or are ambiguous, forcing evaluators to rely on other information. When two agents produce the same harm, intention becomes the key differentiator. This is the Kantian sweet spot: the philosophical thought experiments that dominate ethics seminars are precisely structured to eliminate outcome variation, creating the illusion that intentions are always primary. The experimental conditions that favor intention-centric judgment are, in effect, the conditions philosophers have been selecting for.
Intentions also dominate when the moral violation involves betrayal or trust breach rather than harm per se. Work by Rai and Fiske (2011) within their relational models framework demonstrates that within communal-sharing and authority-ranking relationships, what the agent meant matters enormously—because the violation is about the relationship itself, not its consequences. A friend who intentionally lies to you is judged far more harshly than one whose honest mistake causes greater damage. The intention here serves as evidence about relational commitment, not as a standalone moral factor.
A third condition involves temporal and causal proximity. Young and Saxe's TMS studies (2010) showed that disrupting right temporoparietal junction activity—the brain region most associated with mental state attribution—selectively reduced intention-based judgment in scenarios where the causal chain between intention and outcome was long or indirect. When cause and effect are tightly coupled, the moral system integrates both seamlessly. When they're separated, intention processing requires additional cognitive resources that aren't always available.
The pattern that emerges is clear: intentions dominate moral judgment primarily when outcomes are absent, equalized, or ambiguous; when the moral domain is relational rather than harm-based; and when cognitive resources for mental state reasoning are fully available. These are real conditions, but they describe a subset of moral life, not its totality. The bulk of everyday moral evaluation occurs in outcome-rich, resource-limited contexts—precisely where intentions take a back seat.
TakeawayIntention-centric moral judgment isn't wrong, but it's situationally constrained. It dominates in controlled philosophical thought experiments and trust-based relationships—not in the messy, outcome-saturated conditions where most real moral thinking happens.
Toward an Integrated Architecture of Blame
If neither pure intention-tracking nor crude outcome assessment captures how moral judgment actually works, what does? The most empirically adequate model emerging from the literature is what we might call a weighted integration framework—a dynamic system in which intentions, outcomes, character inferences, contextual norms, and affective responses all contribute to blame assignment, with their relative weights shifting based on situational parameters.
Malle, Guglielmo, and Monroe's (2014) Path Model of Blame provides the most developed version of this framework. Their model proposes that blame judgments proceed through a structured sequence: detecting a norm violation, assessing causality, evaluating intentionality, and then assigning blame with possible mitigations. Crucially, the model shows that intentionality assessment is just one step in a multi-stage process—and that earlier stages (norm detection, causal assessment) can short-circuit or overwhelm later intentionality processing. When the norm violation is severe and the causal link is clear, blame arrives before intention attribution is even completed.
This framework has significant implications for machine ethics. Current AI alignment approaches overwhelmingly focus on specifying the right objectives—essentially trying to give machines the right "intentions." But if human moral cognition evaluates agents through a multi-factor integration process rather than an intention-first process, then building machines that merely have correct goals is insufficient. Morally competent artificial agents would need to track and respond to outcomes, norms, relational contexts, and the affective states of those they affect. The intention fixation in AI ethics may be recapitulating the same philosophical error that experimental philosophy has been documenting in human moral judgment.
The integrated model also reshapes how we should think about moral responsibility in institutional contexts. Corporate ethics and legal liability have long oscillated between intention-based frameworks (mens rea requirements) and outcome-based ones (strict liability). The experimental evidence suggests neither captures folk moral cognition. What people actually deploy is a contextually sensitive blend—and the failure to match institutional structures to psychological reality helps explain why public moral outrage so often diverges from legal and corporate assessments of responsibility.
What's ultimately at stake is the descriptive adequacy of moral philosophy itself. Ethical theory that presupposes intention primacy will systematically mispredict how people actually evaluate moral agents, leading to frameworks that feel philosophically elegant but experientially hollow. An empirically grounded ethics doesn't abandon intentions—it situates them within the richer, messier architecture that moral cognition actually employs.
TakeawayBlame isn't computed from intentions alone—it emerges from a dynamic integration of intent, outcomes, norms, character, and affect, with each factor's weight shifting by context. Moral philosophy and AI ethics both need frameworks that reflect this complexity.
The experimental record is now clear enough to state with confidence: intention attribution is one component of moral judgment, not its foundation. Outcome bias isn't a bug in moral cognition—it's a feature. Character inferences, relational context, and affective processing all compete for influence in the moral evaluation process, and they frequently win.
This doesn't mean intentions are irrelevant. It means that the philosophical tradition has been studying moral judgment under artificially constrained conditions—conditions designed, whether deliberately or not, to make intentions look more central than they are. Moving forward requires models that honor the full complexity of the system.
For researchers in moral psychology, neuroethics, and machine ethics, the practical imperative is the same: build theories and technologies that reflect how moral cognition actually works, not how we wish it worked. The architecture of blame is richer than any single-factor account can capture, and our ethical frameworks should be too.