amuck-landowner

Why are you testing on production?

KuJoe

Well-Known Member
Verified Provider
So I've been reading a lot of threads regarding updates breaking production nodes and the finger being pointed at somebody else other than the person who ran "yum update" or "apt-get upgrade"... really? RedHat/Debian/SolusVM/OpenVZ forced you to install updates blindly without testing them first? They came over to your house and put a gun to your head until you hit enter right? This is why testing things in production is bad, B. A. D., VERY VERY VERY BAD! The only thing worse than testing on production nodes is leaving your SSH port the default 22, and allowing password authentication for root (note: change your SSH port, disable root, and disable password authentication for all accounts to reduce attacks by 99.99999%).

So how do you setup a development server without spending a fortune? Just get a KVM/Xen VPS! Yup, it's that simple.

Get a KVM/Xen VPS, any size will do so go find a cheap yearly plan. We have 2 512MB KVM VPSs that we use for testing, 512MB is overkill but they were free so I opted for the 512MB so I could install CentOS via the ISO without complaints. It doesn't need to be a speed demon, have a gigabit port, or even have 99% uptime, it just needs to be available when you're ready for testing and run "yum update" or "apt-get upgrade" relatively quickly (the faster you get the updates installed the sooner you can test it and then update production). If you're running a custom control panel for your VPSs then you're in luck, if not then there will be the added cost of the licenses for your development setup but assuming the costs are only $10 - $12.50/month (SolusVM), that's an extremely small price to pay for stable production nodes! Seriously, if you could pay any company in the world ~$12.50 a month to guarantee all of your nodes won't break from updates wouldn't you? That's basically what you're doing.

If you're running KVM/Xen nodes you can probably do your testing on a KVM/Xen VPS also since it's more like a real server but I'm not 100% on what it's like to run KVM/Xen VPSs on a KVM/Xen VPS.

Can't get things working 100% on a VPS? Grab a cheap dedicated server. You don't need multiple cores, 32GB of RAM, 10TB on a 1Gbps port, or even RAIDed hard drives. Just find the cheapest "server" you can find, I put server in quotes because even an old desktop will work for testing if you happen to have the parts laying around. If you look around you can find bargain bin servers for less than $15 a month, sure they would probably crash and burn if you tried putting your production workload on them, but for 1-2 dev VPSs they will be just fine.

Now that you have your dev node(s) setup, create a few test VPSs to play with. I recommend having at least 2 VPSs, one with a 32bit OS and the other with a 64bit OS just to cover all of your bases.

So now you have dev node(s) and some test VPSs, what's next? Well now when a new update is released for anything (kernel, software, control panel, etc...) you update your dev node(s) first and then reboot them (ALWAYS REBOOT EVEN IF YOU DON'T NEED TO). Then you login to your control panel and make sure all of the functions work 100% for your test VPSs and make sure provisioning, suspending, termination, etc... work also. When you do this the first time I recommend you make a checklist for yourself while you click every button you can find so you have a quick guide for the next update to streamline the process.

As long as all tests checkout, you now have 2 options:

1) If the updates are for a critical exploit that makes your nodes vulnerable, update them now. Go ahead, no second guessing, worrying, or crossing your fingers and praying it will still be online after the update.

2) If the updates are non-critical and you have the luxury of waiting, I suggest letting your dev node(s) sit for a few days (I try to wait 5-7 days for big updates like kernels) and before you update your production nodes, make sure things still look good on your dev nodes still (you might not notice things like a memory leak for a day or two). It's also worth checking if there are any new updates by running "yum update" or "apt-get update && apt-get upgrade" on your dev nodes to make sure your production nodes don't get anything extra (if there are new updates, then make sure you only update the packages that have been on your dev node(s) for a few days.

Now what do you do if there's a 0-day exploit and you don't have time to test the patch? Then you make time to test it of course. What's the point of having a dev environment if you don't use it every time?

So lets say you go through all of this trouble and find out that the patch for the critical 0-day exploit doesn't play nice with your servers, well really this is your call how to respond. Is the critical 0-day exploit something that can completely destroy your business? If so and I wasn't able to patch it or configure a workaround then I'd just turn off the software with the exploit (maybe that means turning off the node until it's patched). Remember that it's usually harder to fix something once it's broken than to apply a working update. For example, say the SolusVM update (just an example) wipes your database or has a bug where the "fix" is to manually edit every single config file for every single VPS, wouldn't it be safer to just not update to the broken version and wait? Even if it means turning off SolusVM until they release a fix?

I'm sure there are some things I'm leaving out so I'll try to post additional info as I remember it.
 

TruvisT

Server Management Specialist
Verified Provider
+1

On another side, more Windows here, working in an Active Directory setting here with a WSUS Server Setup we follow the same standards. We do not push updates out to client machines till we test them first. If anyone here follows patch-management type e-mail lists(if not you should) you learn quickly that patches do in fact break things.

As far as test beds go, you can grab HP Microservers for cheap off EBay if you want to build and test locally. I got one awhile ago for fun as a test server for my lab with 4 drives+RAID Card+BluRay drive all for around $200 and they were quality RE based drives with a RAID 10 Adaptec card.
 
Last edited by a moderator:

GIANT_CRAB

New Member
I will always remember what Gabe Newell said to the entire Valve gaming and Steam community: "We don't always test our code but when we do, its always done in production."
 
Last edited by a moderator:

k0nsl

Bad Goy
For the excitement and challenge that could possibly entail afterwards...but yeah, testing on a development platform prior to staging it in production is a better idea. Generally. No doubt  :D 

I usually go with the former, though.
 

Munzy

Active Member
It depends, sometimes for small personal or quick things i just do it. I know how to revert, for larger things, I generally test. 

However, I generally do apt-get update on my stuff anyways, and if there is issues, I look into them. 
 

rds100

New Member
Verified Provider
Usually you can test only so much in your dev environment. While it could catch the most obvious problems, there are many things that break only in a real world usage environment.
 

Geek

Technolojesus
Verified Provider
This has old school business practice written all over it and wins on many levels.  Wish more providers had foresight like this  Saw one very recently, had to be nice, the whole time I was typing I looked like this - http://youtu.be/M8kP3vKaDRE?t=34s
 

BTW I just realized our signatures are so alike they could have been separated at birth.  Very big oops on my part - totally unintentional.  I'll fix it tomorrow.  :p
 
Last edited by a moderator:
Top
amuck-landowner