Computational Models of Addiction

person's left hand wrapped by tape measure

6 min read

Addiction challenges classical decision theory by producing persistent choices agents themselves report as harmful.

The aberrant learning hypothesis attributes this to drug-induced prediction errors that corrupt value estimation without bound.

The hyperbolic discounting account traces addictive behavior to preference reversals inherent in non-exponential temporal integration.

The habit system framework reframes addiction as pathological dominance of model-free over model-based control.

Together these accounts suggest addiction emerges from the composition of individually normal subsystems producing jointly maladaptive behavior.

What transforms a rewarding choice into a compulsive one? The question has haunted decision theory since its inception, but only recently have computational frameworks begun to offer principled answers. Addiction presents a peculiar paradox for any theory of rational choice: agents continue selecting options they themselves report as harmful, persistently, against their stated preferences.

Classical expected utility theory, descended from von Neumann and Morgenstern, cannot easily accommodate this phenomenon. If revealed preferences reflect genuine valuations, then addictive behavior appears rational by definition—a conclusion that strains both intuition and clinical evidence. The challenge, then, is to construct formal models that preserve the mathematical rigor of decision theory while capturing the dynamic inconsistency and maladaptive persistence characteristic of addictive states.

Three computational accounts have emerged as particularly generative. The aberrant learning hypothesis locates addiction in corrupted value estimation driven by pharmacologically amplified prediction errors. The hyperbolic discounting account traces it to the temporal structure of preference itself. The habit system account reframes addiction as a pathological transfer of control between neurally dissociable decision systems. Each framework carries distinct empirical signatures and distinct implications for how we understand the boundaries of rational agency.

Aberrant Learning Hypothesis

The aberrant learning account, formalized most prominently by Redish and colleagues, rests on a subtle corruption of the temporal difference learning algorithm. Under standard reinforcement learning, the dopaminergic prediction error signal δ = r + γV(s') − V(s) drives incremental updates to cached state values, with learning asymptoting as predictions converge on actual rewards.

Addictive substances disrupt this equilibrium by generating a pharmacologically induced dopamine response that does not diminish with learning. Because the drug directly stimulates the very signal that reports prediction error, reward predictions can never fully cancel the incoming signal. The result is a value function that grows without bound for drug-associated states, creating an asymmetry no natural reinforcer can match.

This formulation yields precise empirical predictions. Drug-paired cues should acquire incentive salience that outstrips their actual predictive value. Extinction should prove asymmetrically difficult compared to natural reward learning. And choice behavior should become increasingly insensitive to the experienced outcome, since the cached value bears a pathological relationship to actual utility.

Neuroimaging evidence broadly supports these predictions. Ventral striatal responses to drug cues in dependent individuals frequently exceed responses to primary reinforcers, and the magnitude of this cue reactivity correlates with relapse probability. Computational psychiatry has begun using these model-derived quantities as candidate biomarkers, translating abstract theoretical parameters into clinically measurable phenotypes.

The framework's elegance lies in deriving pathology from a single quantitative perturbation of an otherwise normal learning rule. Addiction need not require a broken system; it requires only that one parameter of a well-functioning system be driven outside its evolved operating range.

Takeaway
When a signal designed to track reality is directly manipulable by the stimulus it evaluates, the mechanism that normally produces accurate learning begins producing systematic error. Addiction may be less a malfunction than the predictable behavior of a healthy system given pathological inputs.

Hyperbolic Discounting Account

Where the learning account targets value acquisition, the discounting account targets temporal integration. Exponential discounting, the form assumed by standard economic theory, yields time-consistent preferences: if I prefer A over B today, I will prefer A over B tomorrow, holding delays constant. Hyperbolic discounting, empirically well-documented across species, violates this property.

Under a hyperbolic function of the form V = A/(1 + kD), the discount rate itself depends on the absolute delay. Preferences between a small immediate reward and a larger delayed reward reverse predictably as the immediate option approaches. This generates preference reversals that no single consistent utility function can rationalize.

Ainslie's framework casts addiction as a dramatic instance of this architectural feature. The addict who resolves at noon to abstain by evening faithfully predicts her future preference, but as the cue approaches, the discount curve steepens and the proximate reward overtakes the distal aversive consequences. She has not changed her mind; her mind changes as a function of temporal proximity.

Formally, this transforms the addict into a sequence of transient selves whose preferences partially conflict. The intertemporal problem acquires a game-theoretic structure, with each momentary self choosing strategically given expectations about subsequent selves. Precommitment devices, bright-line rules, and bundling of choices into sequences all become interpretable as equilibrium strategies in this internal negotiation.

Empirical work has converged on elevated discount rates as a robust marker of addictive phenotypes across substances, suggesting that temporal myopia may constitute both a vulnerability factor and a consequence of chronic use. The parameter k, abstract as it seems, indexes something clinically consequential about the structure of choice over time.

Takeaway
Time inconsistency is not a failure of willpower but a structural property of how valuations compose across delays. The self that plans and the self that acts are genuinely different agents solving different optimization problems.

Habit System Dominance

The third framework draws on the dual-system architecture articulated by Daw, Dayan, and Dickinson, in which behavior arises from competition between a goal-directed controller and a habitual controller. The former evaluates actions by prospectively simulating outcomes through a learned model of the environment; the latter selects actions by retrieving cached state-action values updated through direct experience.

Each system has distinct computational virtues. Model-based control produces flexible behavior sensitive to changes in goals or contingencies, but at high computational cost. Model-free control is fast and cheap but slow to adapt when the world changes. Normal behavior reflects an arbitration between these systems, weighted by their relative reliability in the current context.

Addiction, on this account, represents a pathological shift in arbitration. Chronic drug use produces overtraining of model-free values while simultaneously impairing the prefrontal circuitry supporting model-based evaluation. The result is drug-seeking behavior that persists even after the agent fully represents its adverse consequences, because the controller executing the behavior does not consult those representations.

Devaluation paradigms provide the cleanest empirical test. A goal-directed agent, informed that a previously rewarded action now yields an aversive outcome, should immediately suppress that action. A habit-dominated agent will continue executing it until direct experience updates the cached value. Addicted populations and extensively trained animals both show the latter pattern with respect to drug-seeking sequences.

This framework reframes the phenomenology of craving. The reported dissociation between knowing and doing—the addict who recognizes the costs yet acts against them—becomes a literal description of two decision systems producing incompatible recommendations, with the wrong one holding the motor output.

Takeaway
Knowing what is good and doing what is good are products of separable computational systems. Insight alone cannot correct behavior generated by a controller that does not read the relevant representations.

These three frameworks are not rival explanations so much as complementary decompositions of a phenomenon that almost certainly involves all of them. Aberrant learning distorts the values that habits eventually cache; steep discounting magnifies the asymmetry between the cached values and their true long-run consequences; habit dominance then shields the resulting behavior from correction.

What emerges is a picture of addiction as a compound failure—one in which each subsystem behaves according to its normal rules, yet their joint operation produces outcomes no component endorses. The pathology lives in the composition, not in any single broken piece.

For decision theory more broadly, addiction thus serves as a stress test of our formal frameworks. It forces us to abandon the fiction of the unitary rational agent and to take seriously the internal multiplicity that computational models reveal. Understanding how choices go wrong is, in the end, the clearest route to understanding what choice actually is.