Some things I've always wondered about with providers and how they run their operations.
My background is big IT - my employers have typically owned their own DCs, have local sysadmins in each site, big support contracts, etc. So I've done a lot of datacenter-oriented work, just not with public DCs. Obviously, no one who advertises here runs his own datacenter.
So let's say you want to stand up a VPS hosting company in, say, Kansas City (just picked that at random). You buy a server, ship it to your house, configure it how you like, and then ship it off to the DC in KC where you rent colo.
Let's assume
- we're using rack-grade hardware here, not Best Buy gear
- the DC is remote, not something the provider can just drive over to
So my questions:
(1) A while later, let's say a hard drive goes bad. We're using RAID of some flavor so the node stays online. Do providers typically keep onboard RAID spares (i.e., hot spare in the array), and then replace the hard drive?
(2) Where does the replacement drive come from? The DC staff charge some "remote hands" fee to do the work, but do providers typically keep spare hard drives in some sort of holding area there to hot-swap it out? Or do they ship one?
(3) I suppose if you hot-spare or site-replace the hard drive, the user only suffers while the array rebuilds. If you don't have a hot spare or spare on site, users have to wait while you ship a drive and they could run with a degraded array/poor performance/at risk for some time.
(4) What if the whole node is offline - let's say the motherboard dies. Do providers typically keep a spare node - e.g., they've got 5 nodes in a DC, of which 4 are for customers and 1 is a high availability spare? Otherwise I'm guessing the fastest resolution is buy/warranty a new mobo, have DC remote hands install it (I'm guessing this isn't cheap), etc. - could easily be 24+ hours downtime. Or overload other nodes while they move customers off...that could get ugly. Or do providers just sort of "play the odds" and assume that over the life of a server, only drives are likely to fail.
(5) Same thing if perhaps the node just needs to be upgraded...let's say you want to add more hardware, RAM, whatever. There's remote hands, but what happens to customers in the meantime?
My suspicions are that kiddie hosts run everything "naked" - no hot spare, no spare onsite, no spare node. Bigger hosts run everything redundant - hot spare in the array(s), spares of drives/ram/boards/etc. on site, spare node.
Now for smaller hosts who rent servers, the dedi provider owns a lot of this responsibility. If a mobo goes bad, they're on the hook to pay for replacement, have spare parts, etc.
(6) But for those who rent servers, wouldn't they also need to rent a spare if they want to prevent customers from suffering during an extended outage if the server fails?
My background is big IT - my employers have typically owned their own DCs, have local sysadmins in each site, big support contracts, etc. So I've done a lot of datacenter-oriented work, just not with public DCs. Obviously, no one who advertises here runs his own datacenter.
So let's say you want to stand up a VPS hosting company in, say, Kansas City (just picked that at random). You buy a server, ship it to your house, configure it how you like, and then ship it off to the DC in KC where you rent colo.
Let's assume
- we're using rack-grade hardware here, not Best Buy gear
- the DC is remote, not something the provider can just drive over to
So my questions:
(1) A while later, let's say a hard drive goes bad. We're using RAID of some flavor so the node stays online. Do providers typically keep onboard RAID spares (i.e., hot spare in the array), and then replace the hard drive?
(2) Where does the replacement drive come from? The DC staff charge some "remote hands" fee to do the work, but do providers typically keep spare hard drives in some sort of holding area there to hot-swap it out? Or do they ship one?
(3) I suppose if you hot-spare or site-replace the hard drive, the user only suffers while the array rebuilds. If you don't have a hot spare or spare on site, users have to wait while you ship a drive and they could run with a degraded array/poor performance/at risk for some time.
(4) What if the whole node is offline - let's say the motherboard dies. Do providers typically keep a spare node - e.g., they've got 5 nodes in a DC, of which 4 are for customers and 1 is a high availability spare? Otherwise I'm guessing the fastest resolution is buy/warranty a new mobo, have DC remote hands install it (I'm guessing this isn't cheap), etc. - could easily be 24+ hours downtime. Or overload other nodes while they move customers off...that could get ugly. Or do providers just sort of "play the odds" and assume that over the life of a server, only drives are likely to fail.
(5) Same thing if perhaps the node just needs to be upgraded...let's say you want to add more hardware, RAM, whatever. There's remote hands, but what happens to customers in the meantime?
My suspicions are that kiddie hosts run everything "naked" - no hot spare, no spare onsite, no spare node. Bigger hosts run everything redundant - hot spare in the array(s), spares of drives/ram/boards/etc. on site, spare node.
Now for smaller hosts who rent servers, the dedi provider owns a lot of this responsibility. If a mobo goes bad, they're on the hook to pay for replacement, have spare parts, etc.
(6) But for those who rent servers, wouldn't they also need to rent a spare if they want to prevent customers from suffering during an extended outage if the server fails?