amuck-landowner

Site Downtime November 20, 2013

drmike

100% Tier-1 Gogent
Looks like filtering from CNServers went down/issues to BuyVM in Vegas.

What's the official word?
 

MannDude

Just a dude
vpsBoard Founder
Moderator
No idea. Waiting to hear back from BuyVM. The site was never down for -me-, so I wasn't impacted. Seems to be regional or networking related for their filtered IPs.

Anyhow, I switched filtering to x4b so it appears it's working.
 

Aldryic C'boas

The Pony
I'm still waiting to hear back myself.  Overnight tech contacted CNServers, they haven't replied yet.  I just put in my own ticket and unzipped.

The Filtering itself seems to be working fine.  Routing is the issue.  From what I'm seeing/hearing on IRC, anyone coming in over nlayer are the ones affected.  I'll update again soon as I know more.
 

drmike

100% Tier-1 Gogent
Some debugging showed OpenDNS was still showing old DNS provider and pushing queries there.

Yeah, DNS provider changed with the outage this morning as well.     So the records with the old provider - Rage4 have been updated too.
 

Amitz

New Member
The BuyVM.net website is completely down for me...


and: "Hey, it's not just you! The URL-address http://buyvm.net looks down from here."
 
Last edited by a moderator:

Aldryic C'boas

The Pony
Well, that was fun.

Turns out, CNServers (and our tunnel setup) was just fine.  The issue was (of course) HE.

Fran noticed that when he shoved outbound traffic back through CNServers, everything started working fine again (though this couldn't be a permanent solution, as it put a ton of strain on CNS).  So we got ahold of FiberHub, and were informed of the following:

HE.net enabled RPF on our port last night due to a large attack originating from our network using spoofed IP's that I wasn't able to track down - I didn't realize it would impact you. If you can send me the prefixes that you are sending over CNServers, I'll have HE.net add exceptions for them while we sort out the rest of this mess.
So, tl;dr - HE screwed up our routing.  FiberHub contacted them directly with the ranges we need exempted from their BS, and at this point we're just waiting on HE to get that in place so we'll be back to normal again.
 
  • Like
Reactions: scv

Aldryic C'boas

The Pony
It's also worth mentioning that due to the filtering fiasco, Stallion is currently unable to contact the Jersey nodes.  So anyone with service at our Choopa deployment will be stuck at 'Getting Status...' on their VM page.  The nodes and VMs are fine, no worries there;  everything will return to normal once this gets sorted out.
 

Francisco

Company Lube
Verified Provider
The issue is HE related:

Hello,

HE.net enabled RPF on our port last night due to a large attack originating from our network using spoofed IP's that I wasn't able to track down - I didn't realize it would impact you. If you can send me the prefixes that you are sending over CNServers, I'll have HE.net add exceptions for them while we sort out the rest of this mess.

--

Rob Tyree

Fiberhub Colocation & Internet Services
And a quote from IRC to cut the tension:

[08:52] <DaIRC42327> welp, waiting on HE at this point

[08:52] <DaIRC42327> should be fast i hope

[08:52] <&Aldryic> HE? Fast?

[08:53] <&Aldryic> You're being optimistic again, boss.

[08:53] <lbft_> :(((((((((

[08:53] <DaIRC42327> i told Rob to offer them a pound of weed

[08:53] <DaIRC42327> in exchange for a fast turn around

[08:53] <lbft_> if we're relying on HE we're all doomed

[08:53] <DaIRC42327> being HE they'll hacky sack that shit into action

[08:53] <&Aldryic> hah

[08:53] <DaIRC42327> Aldryic you missed out man

[08:53] <DaIRC42327> every single HE worker is straight hippy

[08:53] <DaIRC42327> 'dude...like..ipv6 has so many addresses'

[08:53] <The_Hatta> how -- how would that not affect you >_>

[08:54] <DaIRC42327> 'like, 1 for every atom in the world'

[08:54] <lbft_> free love and free ipv6 tunnels

[08:54] <DaIRC42327> anyways this explains things

[08:54] <DaIRC42327> i set a source route

[08:54] <&Aldryic> Yeah... probably for the best that I never meet those folks <_<

[08:54] <&Aldryic> It would not end well.

[08:54] <DaIRC42327> forcing everything back over CN

[08:54] <DaIRC42327> but CN hates when we do that

[08:54] <&Aldryic> lol

[08:54] <DaIRC42327> that's why there was the big burst of working traffic

[08:54] <DaIRC42327> then it exploded into a big flaming ball of fran

[08:55] <The_Hatta> quote of the day\
Francisco
 

drmike

100% Tier-1 Gogent
So, tl;dr - HE screwed up our routing.  FiberHub contacted them directly with the ranges we need exempted from their BS, and at this point we're just waiting on HE to get that in place so we'll be back to normal again.
So how doesn't something like this happen in other datacenters?

I'll raise my hand again for recommending BuyVM at least moves their website + other critical operations stuff outside of the Vegas facility.
 

Francisco

Company Lube
Verified Provider
So how doesn't something like this happen in other datacenters?

I'll raise my hand again for recommending BuyVM at least moves their website + other critical operations stuff outside of the Vegas facility.
That wouldn't change much. The problem is because we don't force outbound traffic over CN so we 'spoof' the traffic. It's really the only option given how much transit we push over the filtering ranges.

Francisco
 

Aldryic C'boas

The Pony
So how doesn't something like this happen in other datacenters?

I'll raise my hand again for recommending BuyVM at least moves their website + other critical operations stuff outside of the Vegas facility.
We would need filtering wherever we put it.  Which means this situation could have just as easily been replicated somewhere else at it was at FH.

There are also other points to consider... for starters, we would _never_ offload our panels to another host.  That simply wont happen.  We also learned the hard way (with CC) what happens when you cannot trust your own DC.  I cannot think of anyplace offhand I would trust our hardware in more than FiberHub;  and I sure as hell wont risk our clients’ info in someone else’s hands.
 

drmike

100% Tier-1 Gogent
We would need filtering wherever we put it.  Which means this situation could have just as easily been replicated somewhere else at it was at FH.

There are also other points to consider... for starters, we would _never_ offload our panels to another host.  That simply wont happen.  We also learned the hard way (with CC) what happens when you cannot trust your own DC.  I cannot think of anyplace offhand I would trust our hardware in more than FiberHub;  and I sure as hell wont risk our clients’ info in someone else’s hands.
I am sympathetic, truly.

If not wanting to move things outside of Vegas network, then perhaps redundancy for it over in Jersey?

When these issues happen, regardless of cause, the panel goes offline, the website,  and other reference resources people go to check (those of us who haven't dutifully bookmarked all the shortcuts and ripped all the info to local collection).
 

splitice

Just a little bit crazy...
Verified Provider
That wouldn't change much. The problem is because we don't force outbound traffic over CN so we 'spoof' the traffic. It's really the only option given how much transit we push over the filtering ranges.


Francisco
Sorry boss. :p
 
Top
amuck-landowner