Zero downtime deployment in Apache

Whether you deploy using git push, rsync or even sftp you never want your site to be down or inconsistent during updates. If your site has high traffic you may not only do frequent updates, you may also have significant traffic during these updates. With “zero downtime” deployments you can assure that all traffic can keep flowing and that there is little chance that anyone notices your deployment.

Naive approach: delete first

A naive deployment of an Apache website may go like this:

Remove the entire old version from the server
Transfer the entire new version to the server

It should be clear that this can lead to downtime between the removal (often instantaneous) and the completion of the upload of the new version. A popular way to deal with this downtime is to place the site “under maintenance” by redirecting all requests to a single HTML page during the update. This isn’t ideal, especially on larger sites with frequent updates as the downtime may be considerable.

Better: delete last

A rsync with “–delete-after” goes directory by directory and does:

Transfer and create the files that are new in this version to the server
Transfer and replace the files that are modified in this version to the server
Remove the files that were not in this version from the server

Although there should be no “404 page not found” errors we can still have visitors that do requests that access multiple files that are a mix of versions making the response inconsistent. When the files contain code (e.g. JavaScript) then this can cause hard to explain software bugs. Especially when cache headers are applied as they may cause this problem to occur even after the deployment has completed.

Best: zero downtime deployment

An zero downtime deployment takes place in the following steps:

Transfer the entire new version to the server
Atomically switch from the old version to the new version
Remove the entire old version from the server

As the server replaces the old version with the new version any new request that comes in after the switch is handled consistent. Any request that is handled before the switch is also handled consistent. Requests that are handled in a time-frame in which a switch took place may be handled inconsistently. E.g. a request that started 100 milliseconds before the switch and retrieved it’s last asset 500 milliseconds after the switch is such a request. And obviously it is only inconsistent if the requests have actually hit some updated code. So depending on the size of the change that chance may vary.

Bash deployment script

The following script can deploy a git repository with zero downtime:

#!/bin/bash
# set primary to the existing and
# set secondary to the target directory
if [[ -d green ]]; then
  PRIMARY=green
  SECONDARY=blue
else
  PRIMARY=blue
  SECONDARY=green
fi
# make the target directory
mkdir $SECONDARY
# do the deployment
if git -C app.git archive --prefix=$SECONDARY/ | tar x then
  # the deployment succeeded create a new symlink
  ln -s $SECONDARY public_html_new
  # replace the old symlink with the new symlink (atomic)
  mv -fT public_html_new public_html
  # remove the old files from disk
  rm -Rf $PRIMARY
else
  # remove the failed deployment
  rm -Rf $SECONDARY
fi

Note that the command “git -C app.git archive --prefix=$SECONDARY/ | tar x” may also be a Hugo build or any other long lasting deployment process. And this Bash script may be put into the “post-receive” hook of your git repository (when pushing to production). This is what the directory looks like after deployment:

drwxrwxr-x 7 maurits maurits 4096 Nov 21 12:02 app.git
drwxr-xr-x 9 maurits maurits 4096 Feb  1 17:09 green
lrwxrwxrwx 1 maurits maurits    5 Feb  1 17:09 public_html -> green

Note that the Apache “DocumentRoot” is set to “public_html” directory, which is actually a symlink. Nine minutes later (after the next deployment) the directory looks like this:

drwxrwxr-x 7 maurits maurits 4096 Nov 21 12:02 app.git
drwxr-xr-x 9 maurits maurits 4096 Feb  1 17:18 blue
lrwxrwxrwx 1 maurits maurits    5 Feb  1 17:18 public_html -> blue

You can see that the existence of the blue and green directories alternate.

Conclusion

Even though zero downtime deployments can guarantee 100% uptime during deployments, they cannot guarantee 100% consistent replies during deployment. You can trade consistent replies for some forced downtime (and terminated connections) by stopping Apache before and starting it after the zero downtime deployment, but whether or not you want to make that trade-off is entirely up to you. I have had a lot of success with very frequent small incremental deployments using this zero downtime strategy and I hope you will too…

Happy programming!

Naive approach: delete first#

Better: delete last#

Best: zero downtime deployment#

Bash deployment script#

Conclusion#

Links#

Naive approach: delete first

Better: delete last

Best: zero downtime deployment

Bash deployment script

Conclusion

Links