March 27th, 2010

eyes black and white

Graceful Upgrade of Internet Services

The Difficulty

At the edge of your information system or of a component thereof, listening on a public port of a public ip address, are the externally visible servers, usually some kind of load balancer and/or request router.

When upgrading such edge servers, a special difficulty is to not lose connections from existing clients while switching to the new server.

Here is a way to gracefully switch from one edge server to its replacement without dropping any connection.

I'm curious if you know or recommend other (simpler?) ways.

Graceful Atomic Switching of a Single Port

Using a recent Linux kernel with netfilter (aka iptables) and its connection tracking, we can gracefully switch connections from a server to its replacement with minimal changes (if any) to the server code.

The very same algorithm is hopefully also possible by properly configuring hardware routers that people use for high-availability. Or is it?

The principle is as follows:

  • Initially, externally-visible port P1 is redirected to internally-visible port P2 by a simple rule in iptables (optionally, P1=P2 and no rule is needed).

  • Without stopping the server on P2, we start the new server on internally-visible port P3.

  • When wait for the new server to be ready to listen (it either tells us, or we poll it).

  • We add iptables rules so that packets to P1 that are part of ESTABLISHED connections should remain redirected to P2, but other packets should be redirected to P3 instead:

    iptables -t nat -A PREROUTING -p tcp -d ${SERVERIP} --dport ${P1} -m state --state ESTABLISHED,RELATED -j DNAT --to ${SERVERIP}:${P2}
    iptables -t nat -A PREROUTING -p tcp -d ${SERVERIP} --dport ${P1} -j DNAT --to ${SERVERIP}:${P3}
    
  • We then wait for all current connections to the server on P2 to be processed or timed out.

  • When we're satisfied that no connection to the old server remain, we redirect all traffic from P1 to P3 (if P1=P3, no rule is needed, just remove the old ones).

  • We can also safely kill the server on P2 if it didn't exit as part of telling us it's done.

What changes are required to the server code if any? Just a command telling it to stop accepting new requests and send a message when all the current requests are done being processed. Then we can wait for it to be done before to completely switch over to the new port, without the problems associated with the timeout being either too long (hence larger delay in operation and longer disruption in performance) or too short (hence connections being dropped and requests failing). Alternatively, the server can be polled using an existing monitoring interface, and stopped when it has no more pending requests.

Graceful Atomic Switching of Multiple Ports

Sometimes, we want to coherently switch service on multiple public ports of possibly multiple machines from one set of servers to a new set of servers.

First we switch from normal port service to service through a multiplexing proxy port that will manage the distributed atomic switch (see below). When the distributed atomic switch is done, we can do a second local atomic switch from the proxy directly to the new port.

The distributed atomic switch proxy, as in Disnix, will listen to the port, but either withhold accepting or accept and keepalive without actually handling queries. Then it will use its variant of two-phase-commit or paxos protocol to either agree to switch or fallback to the previous configuration. Connections will then be accepted, and depending on whether the commit was successful or rolled back, redirected (respectively forwarded) to the new port or the old port respectively.

This method has some latency due to the use of a distributed transaction. This can cause a one-time spike in bad quality of service by hundreds of milliseconds if an atomic switch is required across multiple data-centers. But packets need not be lost on the way.

Graceful Atomic Switching Without Packet Filtering

IF for whatever reason the packet filter in the Linux kernel is not available (lacking permissions, or not using Linux or something equivalent), then we can still do graceful atomic switching of ports, but it requires proper support to be added to the servers.

After the new server is ready, we send a request to the old server, whereby the old server will stop accepting connections on the externally-visible port (but continue processing any previously started connection). Instead it will pass the fd for the TCP socket of that port to the new server using a CMSG over a AF_UNIX socket.

This might or might not be messy to add to existing server code. At worst, we may have to do surgical replacement of the fd of the server's connection after it has already been open on a different port.