Closed Bug 1079885 Opened 11 years ago Closed 11 years ago

fw1.tier2.yvr1 is flapping

Categories

(Infrastructure & Operations Graveyard :: NetOps: Office Carrier, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: Usul, Assigned: dcurado)

References

Details

No description provided.
Blocks: 1079775
Blocks: 1079878
Blocks: 1079881
working on this
Assignee: network-operations → dcurado
Status: NEW → ASSIGNED
oh yeah, it's terago -- the wireless ISP... (sad clown face) Customer Name: MZ Canada Location Code: MZCA2 Customer Experience Coordinator at: 1-866-TeraGo2 (1-866-837-2462) last time this happened i called and they "reset their radio" and that fixed it. i opened ticket 154747 They asked for a log of all the up/down events so they could look at the time stamps, because they don't see any issues on their side. I have done a cut and paste from IRC, and will edit out all the other stuff, and will mail it to them, noc@terago.ca, with the ticket number in the subject line. They said they will call back within 2 hours. I'm going to hold them to that.
Blocks: 1080058
Terago did in fact call me back, and was able to prove out their entire network as not being lossy. So a) I pointed the finger of blame at them without really looking at everything b) there is a problem between SCL3 and Terago somewhere I'll see if I can figure out where.
This is starting to look like a problem withing Global-Crossing, aka Level3. (So I think James may have this correct already...) If I do a traceroute from fw1.yvr1.mozilla.net, towards admin1.scl3.mozilla.com, but use a source IP of fw1.tier2.mozilla.net... and got this: dcurado@fw1.ops.yvr1.mozilla.net> traceroute 63.245.214.130 source 68.179.67.73 traceroute to 63.245.214.130 (63.245.214.130) from 68.179.67.73, 30 hops max, 40 byte packets 1 64.213.70.193 (64.213.70.193) 61.652 ms 61.146 ms 65.845 ms 2 159.63.48.157 (159.63.48.157) 62.187 ms 63.063 ms 72.150 ms 3 * * * 4 * * 208.178.58.82 (208.178.58.82) 66.437 ms 5 213.155.131.77 (213.155.131.77) 63.486 ms 69.704 ms 68.693 ms 6 62.115.138.195 (62.115.138.195) 66.892 ms 213.155.135.187 (213.155.135.187) 69.347 ms 213.155.134.103 (213.155.134.103) 66.842 ms 7 62.115.8.162 (62.115.8.162) 65.816 ms 66.259 ms 87.889 ms 8 63.245.219.162 (63.245.219.162) 73.442 ms 66.636 ms 64.511 ms 9 63.245.214.45 (63.245.214.45) 76.333 ms 67.825 ms 79.370 ms by hop 9, we're inside our network. The problem appears to start at hop 3, but who is that? So I did a traceroute from admin1.yvr1.mozilla.com to admin1.scl3.mozilla.com, and got this: traceroute to admin1.scl3.mozilla.com (63.245.214.130), 30 hops max, 60 byte packets 1 fw1.private.yvr1.mozilla.net (10.244.75.1) 1.232 ms 0.833 ms 1.074 ms 2 64.213.70.193 (64.213.70.193) 2.100 ms 3.444 ms 2.986 ms 3 so-2-0-0.ar2.sea1.gblx.net (159.63.48.157) 30.971 ms 31.149 ms 31.001 ms 4 ae5-120G.ar7.DAL2.gblx.net (67.16.166.41) 53.327 ms ae6-120G.ar7.DAL2.gblx.net (67.16.166.49) 47.071 ms 51.629 ms 5 telia-2.csr1.DAL2.gblx.net (208.178.58.82) 53.610 ms 53.247 ms 53.644 ms 6 las-b21-link.telia.net (213.248.80.13) 82.937 ms 83.167 ms 82.871 ms 7 sjo-bb1-link.telia.net (62.115.138.191) 81.045 ms sjo-bb1-link.telia.net (62.115.138.193) 80.684 ms sjo-bb1-link.telia.net (62.115.138.191) 80.556 ms 8 mozilla-ic-155747-sjo-bb1.c.telia.net (62.115.8.162) 76.872 ms 80.452 ms 79.591 ms 9 xe-0-0-1.border2.scl3.mozilla.net (63.245.219.162) 120.943 ms 106.340 ms 101.749 ms 10 v-1127.core2.scl3.mozilla.net (63.245.214.45) 82.089 ms 81.972 ms 82.263 ms So it appears to be global-crossing getting back to Terago that is the issue. Of course, we don't actually know that path. We'd have to be sitting inside of Level3, traceroute'ing back toward fw1.tier2.yvr1.mozilla.net Let me see if I can find a looking glass within their network that allows pings to go out.
We switched the Internet access for YVR1 over to our backup link there. Traffic the US west coast from YVR now does not go through Level3, but... there is still loss. Now inside of above.net. My guess (and it's just that, a guess) is that there is a fiber cut in the US NW that is impacting multiple providers. =-(
We still saw packet loss today, going through our backup provider. However, after switching back to our primary provider, the problems appear to be resolved. I am closing this as resolved. Please re-open if this bug should not be put to bed yet.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.