Every engineer who builds distributed systems eventually confronts a seductive promise: exactly-once message delivery. The idea is simple—send a message, have it arrive precisely once, no duplicates, no losses. Vendors advertise it. Frameworks claim to support it. But the mathematical reality is unforgiving. In any system where messages can be lost, exactly-once delivery is provably impossible. This isn't an engineering limitation waiting for a clever workaround. It's a fundamental result rooted in the same class of impossibility theorems that define the boundaries of distributed computation.
The confusion persists because the term conflates two distinct concepts: delivery and processing. A network cannot guarantee that a packet arrives exactly once at its destination. But an application can guarantee that the effects of a message are applied exactly once, even if the message itself arrives zero, one, or many times. The distinction is subtle but architecturally decisive. Systems that ignore it either lose messages silently or process duplicates with cascading side effects. Systems that respect it achieve the semantics users actually care about without pretending physics cooperates.
This article formalizes the impossibility result through the Two Generals Problem, then examines the two most important practical patterns that provide equivalent application semantics: idempotency keys and the transactional outbox. These aren't workarounds—they're precise engineering responses to a proven constraint. Understanding why the constraint exists changes how you reason about every message-passing system you'll ever design.
The Two Generals Problem
The impossibility of exactly-once delivery has a formal proof, and it begins with a thought experiment from 1975. Two generals command separate armies on opposite sides of an enemy city. They must agree on a time to attack simultaneously—if only one attacks, that army is destroyed. Their sole communication channel is a messenger who must cross enemy territory, where the messenger may be captured. The question: can the generals devise a protocol that guarantees coordinated attack?
The answer is no, and the proof is elegant. Suppose General A sends a message: "Attack at dawn." General B receives it and sends an acknowledgment. But General A doesn't know if the acknowledgment was captured. So A needs an acknowledgment of the acknowledgment. But then B doesn't know if that message arrived. The regress is infinite. At every step, the last message sent is uncertain—and removing it invalidates the protocol. No finite sequence of messages over an unreliable channel can establish common knowledge between two parties.
Map this directly onto distributed systems. A producer sends a message to a consumer through a network. The network may drop packets. The producer sends the message; the consumer receives and processes it; the consumer sends an acknowledgment. If the acknowledgment is lost, the producer has no way to distinguish "message processed, ack lost" from "message never arrived." It must choose: resend (risking duplicate processing) or give up (risking message loss). There is no third option. This is the at-least-once versus at-most-once dichotomy, and it is exhaustive over unreliable channels.
The FLP impossibility result (Fischer, Lynch, Paterson, 1985) generalizes this further: in an asynchronous distributed system where even one process can fail, deterministic consensus is impossible. Exactly-once delivery requires a form of consensus between sender and receiver—agreement that the message was delivered and processed precisely once. FLP tells us this consensus cannot be guaranteed in bounded time with any deterministic protocol when failures are possible.
This is not a pessimistic result. It is a clarifying one. Once you accept that the network gives you at-most-once or at-least-once—never exactly-once—you stop searching for a protocol that doesn't exist and start engineering application-level guarantees that provide equivalent semantics. The impossibility doesn't constrain what your system can achieve. It constrains how you must achieve it: not at the transport layer, but at the application layer, through careful state management.
TakeawayExactly-once delivery is not an unsolved engineering problem—it is a proven impossibility over unreliable channels. Accepting this frees you to design at the right abstraction layer: application semantics, not transport guarantees.
Idempotency Keys
If the network guarantees at-least-once delivery (through retries), the application receives the same message one or more times. The challenge becomes ensuring that processing the message n times produces the same effect as processing it once. This property is idempotency, and the mechanism for achieving it at scale is the idempotency key—a unique identifier attached to each logical operation that allows the receiver to detect and suppress duplicates.
The implementation is straightforward in principle. The sender generates a globally unique key (typically a UUID or a deterministic hash of the operation's parameters) and attaches it to the message. The receiver, before processing, checks a persistent store for the key. If the key exists, the operation has already been applied—the receiver returns the cached result without re-executing. If the key is absent, the receiver processes the message, stores the key alongside the result, and responds. The critical invariant: the key check and the state mutation must be atomic. Without atomicity, a crash between processing and key storage creates a window where the operation executes but the duplicate guard isn't recorded.
Key storage strategy introduces real engineering trade-offs. Storing keys indefinitely is correct but expensive—storage grows linearly with operation volume. Most systems implement key expiration with a TTL calibrated to exceed the maximum retry window by a comfortable margin. If your retry policy abandons after 72 hours, a 7-day TTL on idempotency keys is typically safe. But this is a probabilistic guarantee, not an absolute one. A message replayed after key expiration will be processed again. For financial systems where this is unacceptable, some architectures use content-addressable storage or derive keys from immutable operation properties, making duplicates detectable indefinitely without explicit key storage.
The data structure matters at scale. A naive key lookup in a relational table works for moderate throughput but becomes a bottleneck under high write amplification. Bloom filters provide a space-efficient probabilistic first pass—if the filter says the key is absent, it definitely is, and you skip the expensive store lookup for the majority of first-time messages. For keys that the filter reports as possibly present, you fall through to the authoritative store. This two-tier approach reduces read pressure on the key store by orders of magnitude in write-heavy systems.
Idempotency transforms the semantics from at-least-once delivery to effectively-once processing. The network still delivers duplicates. The application simply doesn't care. This is the critical reframing: you stop trying to prevent duplicates at the transport layer and instead make duplicates harmless at the application layer. Stripe, for instance, exposes idempotency keys as a first-class API concept—clients attach a key to every payment request, and the server guarantees that retrying with the same key produces the same result without charging the customer twice. The impossibility theorem stands. The user experience doesn't suffer.
TakeawayIdempotency doesn't prevent duplicate delivery—it makes duplicate delivery irrelevant. Design operations so that applying them multiple times is indistinguishable from applying them once, and the transport layer's limitations disappear from the application's perspective.
Transactional Outbox
Idempotency solves the consumer side: processing a message multiple times safely. But there's a symmetric problem on the producer side: how do you ensure a message is published if and only if the corresponding state change commits? Consider a service that updates a database row and then publishes an event to a message broker. If the service crashes after the database commit but before the publish, the event is lost. If it publishes first and then crashes before committing, the event describes a state change that never happened. This is the dual-write problem, and it is not solvable by simply "being careful" with ordering.
The transactional outbox pattern eliminates the dual-write by collapsing two operations into one. Instead of writing to the database and publishing to the broker as separate steps, the service writes the event into an outbox table within the same database transaction that performs the state change. The event and the state mutation share a single atomic commit. A separate process—often called a relay or log tailer—reads from the outbox table and publishes events to the message broker. If the relay crashes, it resumes from its last known position and re-publishes. Since consumers are idempotent (see previous section), re-publication is harmless.
The relay implementation has two common variants. Polling publishers periodically query the outbox table for unpublished rows, publish them, and mark them as sent. This is simple but introduces latency proportional to the polling interval and puts read load on the database. Change data capture (CDC) tails the database's transaction log directly—Debezium reading a PostgreSQL WAL or MySQL binlog, for instance—and emits events with minimal latency and no polling overhead. CDC is operationally more complex but provides superior throughput and latency characteristics for high-volume systems.
The outbox pattern achieves exactly-once publish semantics with respect to the application's state transitions. Every committed state change produces exactly one logical event (though the relay may physically publish it multiple times). No committed state change is ever silently lost. No event is ever published for a state change that rolled back. Combined with consumer-side idempotency, you get end-to-end effectively-once processing: the producer publishes reliably, the consumer deduplicates reliably, and the impossible exactly-once delivery guarantee becomes irrelevant.
There are subtleties worth noting. Outbox table growth requires pruning—rows that have been successfully relayed should be archived or deleted. Ordering guarantees depend on the relay reading the outbox in commit order, which is straightforward with sequential IDs but requires care with CDC in partitioned databases. And the pattern assumes a single database as the source of truth for both application state and outbox entries. If your architecture splits these across different storage systems, you're back to a distributed coordination problem. The transactional outbox works precisely because it avoids distributed transactions by co-locating the event with the state in a single ACID boundary.
TakeawayThe transactional outbox turns an impossible distributed coordination problem into a local database transaction. By co-locating events with state changes in a single atomic commit, you guarantee that every state transition is published exactly once—without ever relying on exactly-once delivery.
The impossibility of exactly-once delivery is not a limitation to lament—it is a boundary condition that sharpens architectural thinking. The Two Generals Problem and FLP impossibility tell us where to stop looking for solutions at the transport layer and where to start building them at the application layer.
The two patterns presented here—idempotency keys on the consumer side and the transactional outbox on the producer side—compose into a system that provides effectively-once semantics end to end. Messages may be delivered multiple times. Events may be published multiple times. But application state changes are applied exactly once, which is what mattered all along.
The next time a vendor promises exactly-once delivery, ask a precise question: exactly-once delivery, or exactly-once processing? The former is impossible. The latter is engineering. Know which one you're buying.