Imagine a grocery store with ten checkout lanes but only one cashier working. The line stretches to the back of the store while nine registers sit empty. That's essentially what happens when a single server handles all your application's traffic — everything bottlenecks at one point while perfectly good resources go unused.

Load balancers solve this problem by acting as traffic directors, deciding which server handles each incoming request. But distributing work evenly is trickier than it sounds. The wrong strategy can make things worse, sending users to crashed servers or splitting their sessions across machines that don't share information. Let's look at how load balancing actually works — and what keeps it from creating more problems than it solves.

Distribution Algorithms: Choosing Who Gets the Next Request

At its core, a load balancer needs to answer one question every time a request arrives: which server should handle this? The simplest answer is round robin — send the first request to Server A, the second to Server B, the third to Server C, then loop back to A. It's fair, predictable, and works well when all servers are identical and all requests take roughly the same effort to process.

But requests rarely take equal effort. A user loading a homepage is a light task; a user generating a complex report might keep a server busy for seconds. That's where least connections comes in. Instead of blindly rotating, the load balancer checks which server is currently handling the fewest active requests and sends the next one there. It adapts to reality rather than assuming everything is equal.

There are other strategies too. Weighted round robin lets you assign more traffic to beefier servers. Random selection works surprisingly well at large scale because randomness naturally distributes evenly across thousands of requests. The key insight is that no algorithm is universally best — the right choice depends on whether your workloads are uniform or varied, and whether your servers are identical or different sizes.

Takeaway

A distribution algorithm is only as good as its assumptions. Round robin assumes equal work and equal servers. When those assumptions break down, smarter strategies like least connections adapt to what's actually happening.

Health Monitoring: Knowing When a Server Has Gone Silent

Distributing traffic perfectly means nothing if you're sending requests to a server that's crashed. This is why every load balancer includes health checks — periodic tests that ask each server, "Are you still alive and working?" The simplest version is a ping: the load balancer sends a small network request every few seconds and waits for a response. If a server stops responding, it gets pulled from the rotation.

But being alive isn't the same as being healthy. A server might respond to pings while its application is throwing errors on every request. That's why production systems use deep health checks that go beyond the surface. Instead of just pinging the machine, the load balancer hits a dedicated health endpoint that tests the application itself — can it reach the database? Are critical services running? This catches problems a simple ping would miss.

The timing of health checks matters too. Check too frequently and you flood your servers with monitoring traffic. Check too infrequently and users hit dead servers for minutes before the system notices. Most setups also require multiple failed checks before removing a server, preventing a single dropped packet from causing unnecessary disruption. When a server recovers, it gets gradually reintroduced — not slammed with full traffic immediately.

Takeaway

A load balancer that can't detect failure is just a traffic splitter. True reliability comes from continuously verifying not just that servers are reachable, but that they're genuinely capable of doing useful work.

Session Affinity: When Users Need to Stay Put

Here's a scenario that breaks simple load balancing: a user logs in on Server A, which stores their session data in memory. Their next request gets routed to Server B, which has no idea who they are. Suddenly they're logged out for no apparent reason. This happens because load balancers treat each request independently by default — they don't know or care that two requests came from the same user.

Session affinity (sometimes called "sticky sessions") solves this by ensuring that once a user starts interacting with a particular server, subsequent requests go to that same server. The load balancer typically uses a cookie or the user's IP address to remember the mapping. It's a simple fix, but it comes with a real tradeoff: if that server goes down, the user loses their session entirely. You've also reduced the balancer's flexibility — it can't freely redistribute traffic anymore.

The better long-term solution is to make your application stateless at the server level. Instead of storing session data in a single server's memory, store it in a shared location like a database or a distributed cache such as Redis. Now any server can handle any request from any user, because session data lives outside the servers. This gives your load balancer full freedom to distribute traffic optimally — and a server failure doesn't destroy anyone's session.

Takeaway

Session affinity is a useful band-aid, but shared state is the cure. The more your servers can handle any request independently, the more resilient and flexible your entire system becomes.

Load balancing sounds like a simple concept — just spread the work around. But the details matter enormously. Choosing the right distribution algorithm, monitoring server health accurately, and handling user sessions gracefully are the differences between a system that scales smoothly and one that fails in confusing, hard-to-debug ways.

The underlying principle is worth remembering beyond load balancing itself: distributing work is a design decision, not just an infrastructure one. How your application stores state, handles failures, and scales under pressure are choices you make in code — long before a load balancer ever enters the picture.