“A fault domain is a set of hardware components – computers, switches, and more – that share a single point of failure.” IEEE Computer Magazine March 2011 Issue.
Fault domain is simple N+1 redundancy and well, it goes a tad beyond that. I'd think sitting on different bandwidth, different grid power, etc. would be necessary. Otherwise just bumping the fail point upstream in the same facility which is meh.
N+1 with dual setups in the same local area (i.e. say 150 mile radius) is a decent idea where both ends are well connected and ideally directly. I've done a deployment prior with a DC on hot standby with no public internet live to it (all private connected). Used it for just in case worst scenario and mainly for backups day to day.
Thanks drmike and fm7 for the contribution. I like these topics. I wish the forum were more about this and less random. HA is an interesting topic with multiple ways to achieve it. It has some area for improvement.
We can / will cover more relative stuff. Glad to have variety of conversation here.
HA has a lot more options today and more mature solutions. I lost my grip on it as I am not pushing high traffic stuff like I use to (which needed HA setups).
HA should be tailored to the application, and that is why I think application level HA solutions are the best. It's relatively inexpensive, and it fits the situation because you configure your applications for it.
That's how I feel about HA too. Now bouncing from virtual servers to dedicated hardware, well, that tends to happen quickly or should. Otherwise one is going to beat the heck out of a multi-tenant box and/or have not so hot performance / erratic performance out of the virtual HA choke point. That's part of why I run nothing real on VPS instances unless they are slices of a dedi that I am fully in control of and aware of other usage. <--- point here is HA on actual already active or quickly ramping up site.
... chasing pipe dreams of the flawless high availability that are often talked about on forums (VM failover, SAN storage, etc.) but more often than not seem to only provide a new set of problems.
Usually people talking SAN and VM failover are talking closed source or exotic large cost deployments. SANs fail too much and too ugly for me to ever mention them. Heck, I am no fan of RAID either because of mass complexity and potentially horrible failure it can create. Give me RAID for more spindles and please make it SSD today, when it fails, I toss the drives in dust bin and restore from backup (really, I have no patience for it).
Hardware based and exotic bought solutions for HA will teach you how to burn money. Plenty of nice stuff, but, locked into a vendor relationship and all the oddness of their solution. Don't like it? Stuck because likely still lack the competence to paste and glue something from the open source world together. Not trying to be that person with the open source neck beard bias, but...
Yeah work from the software up on HA. All those layers we've mentioned and probably a good bit more when one wants a legit HA setup that is bulletproof. Lots of automation scripts need dev'd to make it all happen gracefully too.
Finally, I haven't read this blog in a while, but use to be a favorite when I was more active on pushing masses of data:
http://highscalability.com. Not a how to as much as a view of what others are doing and solutions you may not be aware of.