Saw this and figured I would get a quick reply here, there is a lot more to be discussed but I wanted to address just a few points here real quick:
1. As far as I know there is no 'Cloud' platform which can have a failure where your server will not be at least restarted, this includes OnApp, Virtualizor, etc. The 'HA' functions for these platforms are usually 'HA' in the sense that they are using two volume streams for each disk volume. In other words if you have a 10GB SSD volume you are really using 2x10GB SSD volumes in a raid 1 where the two volumes are taken from 2 different hypervisors. This allows for if one hypervisor is to crash, you can quickly restart the VM on another hypervisor even if the main data volume was on that crashed hypervisor. Then once the secondary stream comes back online, just like in a normal raid, the volume rebuilds. As far as I know there is no platform that exists where there can be a physical hardware failure and the server can just continue on another piece of hardware. It is possible to migrate the VM between hypervisors before say a reboot of one hypervisor or the other, but this isn't the same as if the physical hardware its currently on crashes.
2. With OnApp for example you can add new hypervisors to their 'Cloud' anytime you like, but again, this doesn't work like you are thinking. You can't provision more resources than exist on one single physical hardware node. So if your thought here was to say two 8 core dedicated servers and across them run 1 16 core server, it isn't going to happen.
Most people who start asking the questions like you are really are looking to build a fully redundant 'Cluster' not use 'HA' virtual servers. You can place this platform on virtual servers, but what you really need is a setup that has servers for all the different functions and fail overs for each of those functions. Something like:
- 2 x Gateway running with heartbeat with one in standby at all time for fail over. If one fails, the ip is taken over by the other and it continues
- 2 (or more) web servers or backends, so if one fails over the gateway in the front can load balance
- a mechanism to handle balancing DB requests
- 2 or more database servers in replication
The company I work with builds these types of clusters for enterprise companies, so I do have some experience with this. If this is really what you are looking for I can try and help get you started, but if you want specific details or trade secrets you will have to spend some money
There is probably more to be answered here, but this may answer some of the initial questions you had at least.
As I said, this is to my knowledge, maybe there is some incredibly expensive platform out there that does do this, but it defiantly isn't going to fall into your 'cheap or free' requirement. Even CA AppLogic (which is EOL and has been discontinued now) only automated the reboots onto a second hypervisor in case of a failure (OnApp doesn't do this to my knowledge, at least by default) and it was to be touted as one of the better pre 'CloudStack' Xen HA platforms.
my 2 cents.
Cheers!