Going back a long time, my personal sites have been hosted a number of ways:
- Pentium Pro on 188.8.131.52 hosting phpBB2
- bluag (not sure how much I used it)
- Shell-script generated static site
- Hugo on one box
- Hugo, Github Pages, and Cloudflare
Finally.. Hugo on three boxes. I found out that Cloudflare was captcha'ing Tor users, even on the "essentially off" setting on Cloudflare. To get "off", you have to get an enterprise account. I imagine that's expensive. The captachas happen largely because there's a high mix of evil traffic to tor exit node IPs; I don't think CloudFlare has any sort of vendetta against Tor in the slightest, it's just the nature of the beast. But, I'd rather have the Tor users and hopefully fruitless attacks against my static site.
Why three boxes?
Well, I didn't want just one for reliability. But the irony is, sometimes one box is more reliable than multiple under flakey failover. And most failover setups have a single point of failure. Ultimately, three boxes because if one goes down, I still have redundancy. Greatly lowers the priorty for responding to my not-that high traffic personal site. And they are pretty cheap, I think $5 a month each.
You can also give multiple A responses in DNS, but behavior around that is mixed. I think it's effectively round-robin. If one is down, you get X percentage of slow response before timing out and trying another, if it does that at all. So that was off the table. I really don't like "mostly down" versus all down or all up.
You can also do more elaborate things where you have multiple web nodes (geographically separate) that adjust DNS if they sense the other is down. So you have some sort of a heartbeat. It's fairly complicated, but simple enough that you can do it well. It will likely have a bit of a lag, at least of the TTL plus some overhead, when things go down.
DNS, though, is pretty smart about retries. It will do its round-robin thing, but the beauty of it is that it's caching + round robin (or more complicated things depending on the recursive resolver). So if you have a penalty, it may be five-ten seconds every TTL per recursive resolver. And possibly, these resolvers may "favor" nodes automatically. I don't know if they do or not, but there's tons of little optimizations they can make, like picking the fastest (generally closest) one in the pool.
But how do you get DNS-style failover to a web server? It's a pretty ugly hack, at least how I've done it.
You tell your registrar about all of your boxes. In this case:
I have an A and AAAA associated with each of those. The .org TLD hosts them for me.
These DNS servers are also the ones serving the content. The trick is that they only serve records for themselves and none of the other nodes. So you get a totally different perspective depending on which node you happen to hit.
#$ drill -T go-beyond.org aaaa org. 172800 IN NS b0.org.afilias-nst.org. org. 172800 IN NS d0.org.afilias-nst.org. org. 172800 IN NS c0.org.afilias-nst.info. org. 172800 IN NS a0.org.afilias-nst.info. org. 172800 IN NS a2.org.afilias-nst.info. org. 172800 IN NS b2.org.afilias-nst.org. go-beyond.org. 86400 IN NS dfw.go-beyond.org. go-beyond.org. 86400 IN NS sjc.go-beyond.org. go-beyond.org. 86400 IN NS atl.go-beyond.org. go-beyond.org. 300 IN AAAA 2001:19f0:6400:86e4:5400:ff:fe1f:538e #$ drill -T go-beyond.org aaaa org. 172800 IN NS a0.org.afilias-nst.info. org. 172800 IN NS b2.org.afilias-nst.org. org. 172800 IN NS b0.org.afilias-nst.org. org. 172800 IN NS c0.org.afilias-nst.info. org. 172800 IN NS d0.org.afilias-nst.org. org. 172800 IN NS a2.org.afilias-nst.info. go-beyond.org. 86400 IN NS sjc.go-beyond.org. go-beyond.org. 86400 IN NS dfw.go-beyond.org. go-beyond.org. 86400 IN NS atl.go-beyond.org. go-beyond.org. 300 IN AAAA 2001:19f0:5400:83aa:5400:ff:fe1f:538f
(Yes, those are subtly different.)
If one of the servers fell over while the record was cached, the client that got that record may be out of luck for at worst, five minutes, which is the TTL I'm using. It could be longer if some resolvers (internally or externally) force a higher minimum TTL, but I'm not too worried about that.
This has a few behaviors:
- The nodes don't have to know about eachother.
- The nodes don't "phone home" trying to get to be the active node.
- It's a little confusing to track from DNS, at least the way I've done it.
- If the caching resolver can't reach a node, neither can the client. Occasionally, the caching resolver will be able to reach it but the client won't if there's a certain sort of networking issue going on.
- If you want to build new servers and not upgrade existing ones to show off new content, you'll probably have a very long period where both are interacting. This is because the TLD TTLs are almost always very high, like 86400. So worst case, it'll be a day till your old ones are purged out.
Today, I threw together some code to (mostly) automate this process and test it live: https://github.com/teran-mckinney/staticator
It's really hacky and I want to improve on a whole lot of it. I also setup a Hidden Service for go-beyond.org. The Hidden Service with all three nodes using the same private key will supposedly handle failover nicely. Probably more slowly than DNS, but at least it should happen. Against the normal recommendations as this is a not-so-hidden service, these tor nodes also act as relays to help support the network. Throttled heavily so I don't hit any sort of bandwidth overage, but hopefully will make a slight difference on the network.
Ideally, I'd like to use something like Ansible and have it all done in Python, rather than shell and this Vultr client written in Go. I also need to add precompression and a webserver that supports gzip transfers. But, there's always tomorrow. Right?
I've never heard of this type of DNS failover before. I'm sure it's been done, but I haven't looked into it. I think it may actually be a viable method for geographically redundant failover with minimal moving parts. You might also have redundant nodes per region, but the effect is about the same. Probably a lot easier than anycasting, though I doubt it'd be as good in the end.
Hopefully, this method has no other serious drawbacks and someone else can find it useful in other environments.
Update: This has been working flawlessly for nearly a year. It's now on just two boxes and I've never noticed any significant downtime.