So I've been reading a lot of threads regarding updates breaking production nodes and the finger being pointed at somebody else other than the person who ran "yum update" or "apt-get upgrade"... really? RedHat/Debian/SolusVM/OpenVZ forced you to install updates blindly without testing them first? They came over to your house and put a gun to your head until you hit enter right? This is why testing things in production is bad, B. A. D., VERY VERY VERY BAD! The only thing worse than testing on production nodes is leaving your SSH port the default 22, and allowing password authentication for root (note: change your SSH port, disable root, and disable password authentication for all accounts to reduce attacks by 99.99999%).
So how do you setup a development server without spending a fortune? Just get a KVM/Xen VPS! Yup, it's that simple.
Get a KVM/Xen VPS, any size will do so go find a cheap yearly plan. We have 2 512MB KVM VPSs that we use for testing, 512MB is overkill but they were free so I opted for the 512MB so I could install CentOS via the ISO without complaints. It doesn't need to be a speed demon, have a gigabit port, or even have 99% uptime, it just needs to be available when you're ready for testing and run "yum update" or "apt-get upgrade" relatively quickly (the faster you get the updates installed the sooner you can test it and then update production). If you're running a custom control panel for your VPSs then you're in luck, if not then there will be the added cost of the licenses for your development setup but assuming the costs are only $10 - $12.50/month (SolusVM), that's an extremely small price to pay for stable production nodes! Seriously, if you could pay any company in the world ~$12.50 a month to guarantee all of your nodes won't break from updates wouldn't you? That's basically what you're doing.
If you're running KVM/Xen nodes you can probably do your testing on a KVM/Xen VPS also since it's more like a real server but I'm not 100% on what it's like to run KVM/Xen VPSs on a KVM/Xen VPS.
Can't get things working 100% on a VPS? Grab a cheap dedicated server. You don't need multiple cores, 32GB of RAM, 10TB on a 1Gbps port, or even RAIDed hard drives. Just find the cheapest "server" you can find, I put server in quotes because even an old desktop will work for testing if you happen to have the parts laying around. If you look around you can find bargain bin servers for less than $15 a month, sure they would probably crash and burn if you tried putting your production workload on them, but for 1-2 dev VPSs they will be just fine.
Now that you have your dev node(s) setup, create a few test VPSs to play with. I recommend having at least 2 VPSs, one with a 32bit OS and the other with a 64bit OS just to cover all of your bases.
So now you have dev node(s) and some test VPSs, what's next? Well now when a new update is released for anything (kernel, software, control panel, etc...) you update your dev node(s) first and then reboot them (ALWAYS REBOOT EVEN IF YOU DON'T NEED TO). Then you login to your control panel and make sure all of the functions work 100% for your test VPSs and make sure provisioning, suspending, termination, etc... work also. When you do this the first time I recommend you make a checklist for yourself while you click every button you can find so you have a quick guide for the next update to streamline the process.
As long as all tests checkout, you now have 2 options:
1) If the updates are for a critical exploit that makes your nodes vulnerable, update them now. Go ahead, no second guessing, worrying, or crossing your fingers and praying it will still be online after the update.
2) If the updates are non-critical and you have the luxury of waiting, I suggest letting your dev node(s) sit for a few days (I try to wait 5-7 days for big updates like kernels) and before you update your production nodes, make sure things still look good on your dev nodes still (you might not notice things like a memory leak for a day or two). It's also worth checking if there are any new updates by running "yum update" or "apt-get update && apt-get upgrade" on your dev nodes to make sure your production nodes don't get anything extra (if there are new updates, then make sure you only update the packages that have been on your dev node(s) for a few days.
Now what do you do if there's a 0-day exploit and you don't have time to test the patch? Then you make time to test it of course. What's the point of having a dev environment if you don't use it every time?
So lets say you go through all of this trouble and find out that the patch for the critical 0-day exploit doesn't play nice with your servers, well really this is your call how to respond. Is the critical 0-day exploit something that can completely destroy your business? If so and I wasn't able to patch it or configure a workaround then I'd just turn off the software with the exploit (maybe that means turning off the node until it's patched). Remember that it's usually harder to fix something once it's broken than to apply a working update. For example, say the SolusVM update (just an example) wipes your database or has a bug where the "fix" is to manually edit every single config file for every single VPS, wouldn't it be safer to just not update to the broken version and wait? Even if it means turning off SolusVM until they release a fix?
I'm sure there are some things I'm leaving out so I'll try to post additional info as I remember it.
So how do you setup a development server without spending a fortune? Just get a KVM/Xen VPS! Yup, it's that simple.
Get a KVM/Xen VPS, any size will do so go find a cheap yearly plan. We have 2 512MB KVM VPSs that we use for testing, 512MB is overkill but they were free so I opted for the 512MB so I could install CentOS via the ISO without complaints. It doesn't need to be a speed demon, have a gigabit port, or even have 99% uptime, it just needs to be available when you're ready for testing and run "yum update" or "apt-get upgrade" relatively quickly (the faster you get the updates installed the sooner you can test it and then update production). If you're running a custom control panel for your VPSs then you're in luck, if not then there will be the added cost of the licenses for your development setup but assuming the costs are only $10 - $12.50/month (SolusVM), that's an extremely small price to pay for stable production nodes! Seriously, if you could pay any company in the world ~$12.50 a month to guarantee all of your nodes won't break from updates wouldn't you? That's basically what you're doing.
If you're running KVM/Xen nodes you can probably do your testing on a KVM/Xen VPS also since it's more like a real server but I'm not 100% on what it's like to run KVM/Xen VPSs on a KVM/Xen VPS.
Can't get things working 100% on a VPS? Grab a cheap dedicated server. You don't need multiple cores, 32GB of RAM, 10TB on a 1Gbps port, or even RAIDed hard drives. Just find the cheapest "server" you can find, I put server in quotes because even an old desktop will work for testing if you happen to have the parts laying around. If you look around you can find bargain bin servers for less than $15 a month, sure they would probably crash and burn if you tried putting your production workload on them, but for 1-2 dev VPSs they will be just fine.
Now that you have your dev node(s) setup, create a few test VPSs to play with. I recommend having at least 2 VPSs, one with a 32bit OS and the other with a 64bit OS just to cover all of your bases.
So now you have dev node(s) and some test VPSs, what's next? Well now when a new update is released for anything (kernel, software, control panel, etc...) you update your dev node(s) first and then reboot them (ALWAYS REBOOT EVEN IF YOU DON'T NEED TO). Then you login to your control panel and make sure all of the functions work 100% for your test VPSs and make sure provisioning, suspending, termination, etc... work also. When you do this the first time I recommend you make a checklist for yourself while you click every button you can find so you have a quick guide for the next update to streamline the process.
As long as all tests checkout, you now have 2 options:
1) If the updates are for a critical exploit that makes your nodes vulnerable, update them now. Go ahead, no second guessing, worrying, or crossing your fingers and praying it will still be online after the update.
2) If the updates are non-critical and you have the luxury of waiting, I suggest letting your dev node(s) sit for a few days (I try to wait 5-7 days for big updates like kernels) and before you update your production nodes, make sure things still look good on your dev nodes still (you might not notice things like a memory leak for a day or two). It's also worth checking if there are any new updates by running "yum update" or "apt-get update && apt-get upgrade" on your dev nodes to make sure your production nodes don't get anything extra (if there are new updates, then make sure you only update the packages that have been on your dev node(s) for a few days.
Now what do you do if there's a 0-day exploit and you don't have time to test the patch? Then you make time to test it of course. What's the point of having a dev environment if you don't use it every time?
So lets say you go through all of this trouble and find out that the patch for the critical 0-day exploit doesn't play nice with your servers, well really this is your call how to respond. Is the critical 0-day exploit something that can completely destroy your business? If so and I wasn't able to patch it or configure a workaround then I'd just turn off the software with the exploit (maybe that means turning off the node until it's patched). Remember that it's usually harder to fix something once it's broken than to apply a working update. For example, say the SolusVM update (just an example) wipes your database or has a bug where the "fix" is to manually edit every single config file for every single VPS, wouldn't it be safer to just not update to the broken version and wait? Even if it means turning off SolusVM until they release a fix?
I'm sure there are some things I'm leaving out so I'll try to post additional info as I remember it.