Every architectural review reaches the moment when a product manager says the magic words: it needs to be real-time. The room nods. Budgets expand. WebSocket clusters get provisioned. Six months later, the team is debugging connection storms at 3 AM while users happily refresh the page every minute.
Real-time has become the architectural equivalent of cloud-native a decade ago—a phrase invoked more than understood. The problem is rarely that real-time is wrong; it is that the cost of genuine real-time is dramatically underestimated, and the requirement itself is rarely interrogated.
This framework offers three lenses for that interrogation. First, separate perceived latency from business latency. Second, understand the structural tradeoffs between push and pull architectures. Third, account honestly for the operational tax that persistent connections impose at scale. Done well, this analysis often reveals that near-real-time is not a compromise—it is the architecturally correct answer.
Latency Requirement Analysis
The first discipline is distinguishing between the latency a user perceives and the latency the business actually requires. These are rarely the same number, yet teams routinely conflate them and architect for the more aggressive figure.
Consider a typical dashboard refresh. A user staring at a chart cannot meaningfully process updates faster than every two or three seconds. A trading system, by contrast, may have business consequences measured in microseconds. Between these poles sits the vast middle ground where most enterprise systems live—where two to thirty seconds is operationally indistinguishable from instantaneous, but architecturally an entirely different problem.
A useful exercise: ask what breaks if the data is five seconds stale. Then ask what breaks at thirty seconds. Then five minutes. The answers expose the real latency budget. Often the honest answer is that nothing breaks until minutes pass, but a stakeholder wants the feeling of immediacy. That is a UX problem, not a data delivery problem, and it has cheaper solutions.
Document the latency requirement as a service-level objective with consequences, not as an adjective. Real-time is not a requirement. Order status must reflect warehouse state within four seconds at the 99th percentile is a requirement. The latter can be designed against; the former invites overengineering.
TakeawayLatency is a number with a business justification, not an aspiration. If you cannot describe what breaks when staleness increases, you do not yet have a real-time requirement.
Push vs Pull Tradeoffs
Once latency is honestly specified, the delivery mechanism becomes a tractable design choice. The options form a spectrum of complexity, and each carries structural consequences that compound at scale.
Polling is the architecturally simplest approach. It is stateless, cacheable, infinitely horizontally scalable, and friendly to every load balancer and CDN ever built. Its cost is wasted requests and a floor on latency equal to the polling interval. For latency budgets above a few seconds, polling is almost always the correct answer—and the one architects are most embarrassed to recommend.
Server-sent events occupy the middle ground. They maintain a long-lived HTTP connection for one-way server-to-client streaming, work through most proxies, and degrade gracefully. They are ideal when updates are server-driven, frequent, and bidirectionality is not required. WebSockets sit at the complex end: full-duplex, low-latency, and unfriendly to virtually every piece of HTTP infrastructure you already own. They demand sticky sessions, custom load balancing, and bespoke health checks.
The architectural error is choosing WebSockets because they are the most capable. The correct heuristic is to choose the least capable mechanism that meets the latency budget. Each step up the ladder discards stateless scalability for diminishing latency returns, and the discarded property is the one your operations team will miss most.
TakeawayChoose the least sophisticated delivery mechanism that meets the requirement. Statelessness is a property worth preserving until the business genuinely demands you trade it away.
Real-Time Infrastructure Costs
Persistent connections impose a tax that does not appear on architecture diagrams. A polling system serving a million users handles a million requests per interval, each independent and load-balanceable. A persistent connection system holds a million open sockets simultaneously, each consuming memory, file descriptors, and—critically—affinity to a specific server.
The operational implications cascade. Deployments become harder because draining connections takes minutes, not seconds. Autoscaling becomes harder because adding capacity does not redistribute existing connections. Failure modes become harder because a single node failure terminates thousands of sessions that must reconnect, often simultaneously, producing the thundering herd that takes down the replacement node.
Then there is the back-pressure problem. In a pull system, an overloaded server simply responds slower and clients naturally throttle. In a push system, the server must actively manage what it sends to whom, buffer messages for slow consumers, and decide what to drop when buffers fill. This is distributed systems engineering of a different order, and it is rarely budgeted for in the initial estimate.
Quantify these costs before committing. A reasonable rule of thumb: persistent connection infrastructure costs roughly three to five times more to operate than equivalent stateless infrastructure, before factoring in the engineering time spent on the failure modes that polling architectures simply do not have.
TakeawayPersistent connections trade stateless simplicity for latency. That trade is sometimes correct, but it is never free, and the bill arrives in operational complexity long after the architectural decision is forgotten.
Real-time is not a virtue. It is a design choice with measurable costs, and like all design choices, it deserves interrogation rather than reverence. The architect's job is to push back on the adjective and replace it with a number.
When that number is genuinely small—milliseconds matter, business consequences are immediate—then the operational tax of persistent connections is justified. When the number is larger than a few seconds, polling remains a mature, scalable, and unfashionable answer. The fashion is not your concern; the system's longevity is.
Design for the latency the business actually needs, not the latency it asked for. The systems that age well are those that resisted complexity they did not require.