<@nagios-scl3> (IRC) Thu 20:47:51 PDT  admin1b.private.tpe1.mozilla.com:atd procs is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds. (http://m.mozilla.org/atd+procs) That's the first check. Other things that also alerted down (and then up and then down etc.): endpoint1.indigo.av.tpe1.mozilla.com admin1a.private.tpe1.mozilla.com fw1.ops.tpe1.mozilla.net conserver1.df501-1.ops.tpe1.mozilla.net wap515.ops.tpe1.mozilla.net
21:31 <@justdave> looks like the route is bouncing around a lot 21:31 <@justdave> reply from the 9th hope from me is coming from any of about 7 different IPs 21:31 <@justdave> 9th hop* 21:32 < jedi> I'm seeing a fair bit of packet loss to r402-s2.tp.hinet.net 21:32 <@justdave> yeah, those are the ones that I'm seeing bouncing around in the trace 21:33 <@justdave> I'm not getting any packet loss tracing there from my house 21:34 <@justdave> it's the link between zayo and hinet 21:34 <@justdave> from home I'm going through level3 on the way there, no packet loss 21:35 <@justdave> from scl3 it's going through zayo, and dropping packets like crazy 21:35 < jedi> Should we re-weight to prioritize l3? 21:35 <@justdave> the loss is inside zayo 21:36 <@justdave> I'd say yes.
Looks like recurrence of https://bugzilla.mozilla.org/show_bug.cgi?id=1289878
Confirmed no office outage. 22:22 -!- Irssi: Join to #mozilla-taiwan was synced in 1 secs 22:22 < jedi> Hello all! We (MOC) are seeing a bunch of alerts for various resources in tpe1. Can anyone confirm network access there? 22:22 < jedi> Like, is there internet in the office? 22:25 < heycam> jedi: internet seems to be working here... 22:25 < heycam> jedi: wired and wifi 22:26 < jedi> Thanks for confirming. Probably just that trip halfway round the planet.
Created attachment 8775886 [details] admin1.tpe1.mozilla.com-traceroute.txt Adding mtr output from two different non-mozilla carriers to public IP of admin1.tpe1.mozilla.com. ~50% packet loss at 118-163-10-188.HINET-IP.hinet.net
Note on the mtr output. Notice that at one hop, a router in hinet, there is ~50% packet loss, but that at the *next* hop, there is no packet loss. This is "normal". The router showing all that packet loss isn't necessarily dropping packets, it's just not making answers to your mtr probes a priority. It's busy doing other stuff. If, when doing an MTR, you see loss starting at hop X, and the same loss continues for each additional hop, then that suggests link congestion. (same thing goes for traceroute)
Assignee: network-operations → dcurado
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.