Snapshot Isolation: The Most Misunderstood Consistency Level

red round fruit on white plastic container

8 min read

Snapshot isolation is defined by two rules — snapshot reads and first-committer-wins — but its critical characteristic is what it omits: read-write conflict detection between concurrent transactions.

This omission permits write skew, where concurrent transactions make individually valid but collectively inconsistent decisions based on overlapping reads and disjoint writes.

Fekete et al. proved that every non-serializable SI history contains a cycle with at least two consecutive rw-antidependency edges in the serialization graph, providing a complete characterization of the anomaly boundary.

Serializable snapshot isolation exploits this theorem to detect dangerous structures at runtime, aborting pivot transactions at modest overhead cost with zero false negatives.

The theory not only identifies the precise boundary of SI's guarantees but prescribes the minimal mechanism needed to achieve full serializability on top of it.

Snapshot isolation occupies a peculiar position in the consistency hierarchy. It is strong enough to prevent dirty reads, non-repeatable reads, and phantom reads in most formulations. It detects and rejects conflicting writes between concurrent transactions. For a large class of real workloads, its behavior is indistinguishable from full serializability. And yet it is not serializable — a fact that most practitioners discover only when an application invariant fails silently in production with no obvious concurrency-related cause.

The confusion runs deep and has historical roots. Oracle's serializable mode historically implemented snapshot isolation, not true serializability. PostgreSQL did the same prior to version 9.1. For over a decade, applications in production relied on guarantees the underlying system never formally provided. The ANSI SQL standard's isolation level definitions, as Berenson et al. demonstrated in 1995, are sufficiently ambiguous that SI falls outside the standard hierarchy entirely. It prevents anomalies that repeatable read permits while admitting anomalies that repeatable read does not. The taxonomy itself obscures the problem.

The formal distance between SI and serializability is narrow but consequential. It admits one additional class of anomaly — write skew — that can violate application invariants with no mechanism for detection after the fact. Understanding why this gap exists, why it persists in widely deployed systems, and how it can be formally closed requires precise specification of SI's visibility rules, its conflict detection boundaries, and the structural conditions under which non-serializable histories emerge. What follows is that specification.

Formal Definition: Two Rules and a Critical Omission

Snapshot isolation is completely specified by two rules. The Snapshot Read Rule states that each transaction Ti reads from a consistent snapshot of the database taken at Ti's start timestamp. Ti observes the version of every data item written by the last transaction that committed before Ti began. No transaction ever sees a partial commit or an inconsistent intermediate state. Every read within a single transaction reflects the same logical point in time.

The First-Committer-Wins Rule governs write conflicts. If two concurrent transactions Ti and Tj both modify the same data item x, at most one may commit. Concurrency here means their execution intervals overlap — Ti started before Tj committed and vice versa. When both attempt to write x, the first to reach commit succeeds and the second is aborted. This prevents lost updates, a guarantee that weaker isolation levels like read committed cannot provide.

Together these two rules produce behavior that feels serializable. The snapshot rule ensures internal read consistency. The first-committer-wins rule prevents destructive write conflicts. Read-only transactions under SI are provably serializable — they observe a consistent committed state and can never contribute to anomalies. For workloads dominated by reads with infrequent write conflicts, SI and serializability are observationally equivalent.

The critical omission is what SI does not track: read-write conflicts between concurrent transactions. Ti can read item x while concurrent Tj modifies x, and both commit successfully — provided Ti did not also write to x. In the multiversion serialization graph, this creates an rw-antidependency from Ti to Tj. Under serializability, such dependencies constrain the legal transaction ordering. Under snapshot isolation, they are entirely invisible to the conflict detection mechanism.

Adya, Liskov, and O'Neil formalized this gap using generalized dependency graphs. They showed SI prohibits phenomena G0 through G1c — all dirty write and dirty read variants — and prevents non-repeatable reads on individual items. But SI permits the general G2 anomaly: cycles in the serialization graph involving rw-antidependency edges across multiple items. Berenson et al. had earlier demonstrated that SI does not fit the ANSI SQL isolation hierarchy at all, occupying a position formally incomparable to repeatable read in certain formulations.

Takeaway
Snapshot isolation's power and its vulnerability stem from the same architectural choice — it detects write-write conflicts between concurrent transactions but remains structurally blind to read-write conflicts, creating a gap that is invisible in practice until an invariant breaks.

Write Skew Anatomy: The Minimal Non-Serializable Structure

Write skew is the canonical anomaly that snapshot isolation permits. It arises when two concurrent transactions read an overlapping dataset, make disjoint writes based on what they observed, and both commit successfully — producing a database state that no serial execution of those same transactions could have produced. The anomaly is subtle precisely because each transaction, examined individually, behaves correctly. The violation emerges only from their concurrent interaction.

Consider the formal structure. Transactions T1 and T2 both read items x and y from the same snapshot. T1 modifies x based on the observed values. T2 modifies y based on the same observed values. Under SI, both transactions see identical snapshots — neither has committed when the other reads. Both pass the first-committer-wins check because they write to different items. Both commit. The resulting state reflects T1's decision based on a now-stale y and T2's decision based on a now-stale x — a combination that no serial ordering produces.

The hospital on-call invariant makes this concrete. The system requires at least one doctor on call at all times. Alice and Bob are both currently on call. Alice reads the roster, counts two doctors, determines it is safe, and removes herself. Concurrently Bob reads the same roster snapshot, reaches the same conclusion, and removes himself. Both transactions satisfy the invariant at read time. Both commit under SI's rules. Zero doctors remain on call. The invariant is violated and no mechanism within SI flags the inconsistency.

In the multiversion serialization graph, write skew produces a cycle of exactly two rw-antidependencies: T1 →rw T2 →rw T1. Each transaction reads a version of an item that the other overwrites. No serial ordering accommodates this — placing T1 before T2 contradicts one antidependency edge, and placing T2 before T1 contradicts the other. This two-edge cycle is the minimal non-serializable structure possible under SI, and it is the only type of cycle that SI's conflict detection fails to prevent.

Fekete, Liarokapis, O'Neil, and O'Neil proved the decisive characterization: every non-serializable SI history contains a cycle with at least two consecutive rw-antidependency edges in the serialization graph. This result is both necessary and sufficient. It completely delineates the boundary between what SI guarantees and what it cannot. Critically, it provides the exact structural target for any detection mechanism. The shape of the anomaly itself prescribes the architecture of the solution.

Takeaway
Write skew is not an edge case but a structural inevitability of any isolation mechanism that permits concurrent reads without tracking their downstream write implications — and its minimal two-edge rw-antidependency cycle is also its complete theoretical characterization.

Detection and Prevention: Closing the Gap at Bounded Cost

Serializable Snapshot Isolation, introduced by Cahill, Röhm, and Fekete in 2008, translates the characterization theorem directly into a runtime mechanism. Since every non-serializable SI history contains two consecutive rw-antidependencies, the system need only detect when a transaction becomes the pivot of such a dangerous structure — simultaneously the target of one rw-antidependency and the source of another. Detecting this single structural condition is sufficient to prevent all SI anomalies.

The mechanism tracks antidependencies during execution. When transaction T reads data that a concurrent transaction has already overwritten, the system records an incoming rw-conflict on T. When T overwrites data that a concurrent transaction has already read, it records an outgoing rw-conflict. If T accumulates both — becoming the center of a dangerous structure — the system aborts it. Detection is local to each transaction and requires no global serialization graph construction, keeping overhead proportional to the number of active concurrent transactions rather than total history length.

SSI is deliberately conservative. Not every dangerous structure leads to an actual non-serializable outcome — the cycle may not close, or other dependency constraints may resolve the conflict. SSI will sometimes abort transactions that would have been safe, producing false positives. But it never permits a non-serializable history to commit — zero false negatives. Cahill's original evaluation demonstrated throughput reduction in the low single-digit percentage range compared to plain SI across standard benchmarks. A remarkably modest cost for a categorical guarantee upgrade.

PostgreSQL adopted SSI as its serializable isolation level in version 9.1, becoming the first major production database to offer true serializability built atop a snapshot isolation foundation. The implementation uses SIREAD locks to track predicate-level read dependencies and combines them with write-conflict detection to identify dangerous structures at commit time. Developers requesting serializable isolation finally receive it — closing a gap that had persisted silently in production systems for over a decade.

Alternative approaches address the same theoretical gap through different trade-offs. Static analysis examines transaction programs at design time to determine whether any SI execution could produce non-serializable behavior — if the static dependency graph contains no dangerous structure, plain SI suffices without runtime overhead. Materialized conflict techniques like select for update convert invisible rw-antidependencies into write-write conflicts that SI already detects. Each method trades automation for precision, but all derive from Fekete's characterization of exactly what structural condition makes SI histories non-serializable.

Takeaway
The gap between snapshot isolation and serializability can be closed without abandoning SI's architecture — by layering dependency tracking that targets the exact structural condition the theory identifies, converting a narrow theoretical vulnerability into a bounded runtime cost.

Snapshot isolation persists in production systems not through negligence but through rational trade-off. It provides strong consistency at lower overhead than traditional serializability, and for many workloads the theoretical gap never manifests as a practical failure. The problem is that the boundary of its guarantees is precisely the kind of subtlety that resists informal reasoning and survives code review.

Fekete's characterization theorem transforms this subtlety into a tractable formal property. By proving that consecutive rw-antidependencies are both necessary and sufficient for non-serializable SI behavior, it specifies exactly what any correct detection mechanism must target — no more and no less.

Serializable snapshot isolation demonstrates that theoretical precision yields direct engineering leverage. The characterization does not merely describe the anomaly — it prescribes the minimal mechanism sufficient to eliminate it, at bounded cost, without sacrificing the architectural advantages that made snapshot isolation dominant in the first place. The distance between understanding a vulnerability and provably preventing it turns out to be surprisingly small.