Skip Lists: A Probabilistic Alternative to Trees

macro photography of black circuit board

8 min read

Skip lists achieve expected O(log n) search complexity through probabilistic promotion, with strong concentration bounds that make worst-case degradation vanishingly unlikely.

The backward analysis of search cost yields a clean recurrence showing the expected number of comparisons is (log₁/p n) / p, with tunable trade-offs between search cost and space overhead.

Skip list modifications are structurally local—no rotations or rebalancing—making lock-free concurrent implementations dramatically simpler than their balanced tree counterparts.

Memory locality is the skip list's primary weakness: pointer-chasing at each level incurs cache misses that B-trees avoid through contiguous multi-key nodes.

The optimal choice between skip lists and balanced trees depends on whether the dominant cost in your system is read throughput, write concurrency, or engineering complexity.

Balanced binary search trees—AVL trees, red-black trees, B-trees—have dominated ordered data structure design for decades. They deliver guaranteed O(log n) operations through carefully maintained structural invariants. But maintaining those invariants comes at a cost: rotations, color flips, node splits, and complex concurrent access protocols that ripple through shared structure. In 1990, William Pugh proposed a radically different approach. Instead of enforcing balance through deterministic restructuring, skip lists achieve expected logarithmic performance through randomization.

A skip list is a layered linked list where each element is promoted to higher levels with independent probability p, typically 1/2 or 1/4. Search proceeds from the topmost level downward and rightward, skipping over large segments of the structure. The elegance lies in what's absent: no rotations, no rebalancing, no global structural invariants to maintain. The structure's balance is a statistical property, not an enforced one.

This distinction has profound implications for concurrent programming, memory layout, and engineering simplicity. Yet skip lists are not universally superior—their pointer-heavy topology interacts poorly with modern cache hierarchies in ways that balanced trees, particularly B-trees, handle gracefully. Understanding when skip lists are the right choice requires a rigorous look at their probabilistic guarantees, their concurrency advantages, and their memory access patterns. That's what we'll do here.

Expected Complexity: Deriving the Logarithmic Bound

The expected search cost in a skip list is O(log n), but the derivation is more subtle than it first appears. Consider a search that terminates at some node at level 1. We analyze the cost by tracing the search path backward—from the target node up and to the left toward the header at the top level. At each step backward, we either climb one level (the current node was promoted) or move left on the same level (the current node was not promoted). Each climb happens with probability p, and each leftward step with probability 1 − p.

Let C(k) denote the expected cost of climbing k levels in this backward analysis. We obtain the recurrence C(k) = (1 − p) · (1 + C(k)) + p · (1 + C(k − 1)), which solves to C(k) = k/p. Since the expected number of levels is log_1/p n, the total expected search cost is (log_1/p n) / p + 1/(1 − p), which is O(log n). For p = 1/2, this yields roughly 2 log₂ n comparisons; for p = 1/4, roughly (4/3) log₂ n comparisons with lower space overhead.

What about worst-case degradation? The probability that a skip list with n elements has height exceeding c · log_1/p n is at most n^1−c. For practical values—say n = 10⁶ and c = 3—the probability of the height exceeding 3 log n is less than 10⁻¹². The structure is overwhelmingly likely to be well-balanced. This isn't a hand-waving argument; it follows directly from a union bound over the promotion probability of each element.

The variance of the search cost is also O(log n), meaning the distribution of actual costs is tightly concentrated around the expectation. This contrasts with hash tables, where expected O(1) lookups can degrade dramatically under adversarial inputs or poor hash functions. Skip lists derive their randomness internally—from coin flips at insertion—so their performance is independent of input distribution. An adversary who controls the key sequence cannot force worst-case behavior unless they also control the random number generator.

One important nuance: the constant factors in skip list search are higher than in balanced BSTs. A red-black tree performs at most 2 log₂(n + 1) comparisons in the worst case, deterministically. A skip list with p = 1/2 performs about 2 log₂ n comparisons in expectation, with occasional variance. For latency-sensitive applications where tail behavior matters, this distinction is non-trivial. The probabilistic guarantee is excellent—but it is not a hard guarantee.

Takeaway
Skip lists trade deterministic worst-case bounds for simplicity of implementation and independence from input distribution. The probabilistic guarantees are strong enough for virtually all practical purposes, but when hard latency bounds are non-negotiable, deterministic structures remain the safer choice.

Concurrency Properties: Why Randomization Simplifies Locking

The deepest practical advantage of skip lists over balanced trees is concurrency. Consider what happens when you insert into a red-black tree: the insertion may trigger rotations that restructure nodes far from the insertion point. These rotations require exclusive access to parent and grandparent nodes, creating contention windows that span multiple levels of the tree. Even with fine-grained locking, concurrent writers interfere with one another through these structural modifications.

Skip lists sidestep this entirely. An insertion into a skip list modifies only the immediate predecessor pointers at each level where the new node appears. There are no rotations, no rebalancing, no cascading structural changes. The set of nodes that must be modified is determined before the insertion begins—during the search phase—and involves only local pointer updates. This locality of modification is what makes lock-free and lock-based concurrent skip lists dramatically simpler to implement correctly.

The canonical lock-free skip list, described by Fraser and Harris and refined in Java's ConcurrentSkipListMap, uses a mark-and-sweep approach for deletion and simple CAS (compare-and-swap) operations for insertion. A node is logically deleted by marking its forward pointer, then physically unlinked by subsequent operations. Insertion proceeds bottom-up: the node is first inserted at level 1, then linked in at successively higher levels. If the process is interrupted—say, by a concurrent deletion—the partially-inserted node is simply a shorter skip list node, which is a perfectly valid state. The structure is inherently tolerant of partial updates.

Contrast this with concurrent B-trees or red-black trees, which require sophisticated protocols like link-cut trees, hand-over-hand locking, or optimistic lock coupling to achieve comparable concurrency. These techniques are correct but complex, hard to verify, and often require significant engineering effort. The skip list's advantage isn't that it's faster in the single-threaded case—it often isn't—but that the concurrent implementation is closer to the sequential one. The gap between a textbook skip list and a production concurrent skip list is remarkably small.

There's a subtlety worth noting. The probabilistic height assignment means that under high contention, tall nodes—which participate in more levels—become bottlenecks because more operations traverse them. In practice, this effect is mitigated by the logarithmic expected height and the fact that upper-level pointers are read-only during search. But in extreme contention scenarios, techniques like level-based partitioning or per-level lock striping may be necessary. The simplicity advantage holds, but it's not infinite.

Takeaway
The key insight is that skip list modifications are structurally local—no operation needs to restructure distant parts of the data structure. This locality is what makes concurrent implementations tractable, and it arises directly from the fact that balance is probabilistic, not enforced.

Memory Locality Trade-offs: When Pointer Chasing Hurts

Every advantage has a cost, and for skip lists, the cost is memory locality. A skip list search follows pointers at each level, and those pointers refer to nodes scattered across the heap. Each forward step is a potential cache miss. On modern hardware—where an L1 cache hit costs ~1 ns but a main memory access costs ~100 ns—the number of pointer dereferences matters far more than the number of comparisons.

B-trees were designed with this reality in mind. A B-tree node stores multiple keys contiguously in memory, so a single cache-line fetch brings dozens of keys into L1. A binary search within that node involves no pointer chasing—just arithmetic on an array. For a B-tree with branching factor 64, searching 10⁶ elements requires roughly 3 node accesses, each of which involves ~6 comparisons within a cache-resident array. A skip list searching the same 10⁶ elements performs ~40 pointer dereferences, each of which is likely a cache miss at the upper levels.

This comparison isn't hypothetical. Empirical studies consistently show that B-trees and their variants (B+ trees, cache-oblivious B-trees) outperform skip lists on read-heavy workloads in main memory by factors of 2–5×, entirely due to cache behavior. The skip list performs fewer comparisons in the asymptotic sense, but comparisons are cheap; memory accesses are expensive. The skip list's O(log n) hides a constant that grows with the ratio of memory latency to compute latency—a ratio that has worsened with every hardware generation.

There are mitigations. Unrolled skip lists store multiple keys per node, improving spatial locality at the cost of complicating insertion. Cache-sensitive skip list layouts allocate nodes in level-aware memory pools, improving locality for upper-level traversals. Some implementations use arrays instead of linked lists for the lowest level, turning the skip list into a hybrid structure. Each of these techniques narrows the gap with B-trees but also erodes the skip list's core advantage: simplicity.

So when do skip lists win? In concurrent, write-heavy workloads where the simplicity of lock-free insertion outweighs the read-path cache penalty. In applications where range scans are rare and point lookups are interleaved with frequent insertions and deletions. And in systems where engineering time is the binding constraint—where a correct concurrent skip list can be implemented in days, while a correct concurrent B-tree takes months. The choice is never about which structure is abstractly superior. It's about which costs dominate in your specific system.

Takeaway
Cache behavior, not asymptotic complexity, determines the performance of ordered data structures on modern hardware. Skip lists pay a pointer-chasing penalty that B-trees avoid. The right choice depends on whether your bottleneck is read throughput, write concurrency, or engineering complexity.

Skip lists occupy a distinctive niche in the data structure landscape. Their probabilistic balance eliminates the need for structural invariants, yielding expected O(log n) operations with strong concentration bounds and independence from input distribution. These properties alone make them worthy of study.

Their real power, however, emerges in concurrent settings. The locality of modification—the fact that insertions and deletions touch only neighboring nodes—makes lock-free implementations not just possible but practical. This is why skip lists appear in production systems like LevelDB, Redis, and Java's concurrent collections, often in contexts where balanced trees would impose prohibitive engineering complexity.

But they are not a universal replacement for trees. On cache-conscious hardware with read-dominated workloads, B-trees and their variants remain superior. The discipline of systems engineering is choosing the right structure for the right constraints—and skip lists are a powerful option when those constraints favor simplicity and concurrency over raw sequential throughput.