Infrastructure & Operations
NetOps: Other
2 years ago
2 years ago


(Reporter: jedi, Assigned: dcurado)




(1 attachment)



2 years ago
<@nagios-scl3> (IRC) Thu 20:47:51 PDT [5465] procs is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.

That's the first check.  Other things that also alerted down (and then up and then down etc.):

Comment 1

2 years ago
21:31 <@justdave> looks like the route is bouncing around a lot
21:31 <@justdave> reply from the 9th hope from me is coming from any of about 7 different IPs
21:31 <@justdave> 9th hop*
21:32 < jedi> I'm seeing a fair bit of packet loss to r4[01]
21:32 <@justdave> yeah, those are the ones that I'm seeing bouncing around in the trace
21:33 <@justdave> I'm not getting any packet loss tracing there from my house
21:34 <@justdave> it's the link between zayo and hinet
21:34 <@justdave> from home I'm going through level3 on the way there, no packet loss
21:35 <@justdave> from scl3 it's going through zayo, and dropping packets like crazy
21:35 < jedi> Should we re-weight to prioritize l3?
21:35 <@justdave> the loss is inside zayo
21:36 <@justdave> I'd say yes.

Comment 2

2 years ago
Looks like recurrence of

Comment 3

2 years ago
Confirmed no office outage.

22:22 -!- Irssi: Join to #mozilla-taiwan was synced in 1 secs
22:22 < jedi> Hello all!  We (MOC) are seeing a bunch of alerts for various resources in tpe1.  Can anyone confirm network access there?
22:22 < jedi> Like, is there internet in the office?
22:25 < heycam> jedi: internet seems to be working here...
22:25 < heycam> jedi: wired and wifi
22:26 < jedi> Thanks for confirming.  Probably just that trip halfway round the planet.
Created attachment 8775886 [details]

Adding mtr output from two different non-mozilla carriers to public IP of ~50% packet loss at

Comment 5

2 years ago
Note on the mtr output.
Notice that at one hop, a router in hinet, there is ~50% packet loss, but that at the *next* hop, there is no packet loss.  
This is "normal".  The router showing all that packet loss isn't necessarily dropping packets, it's just not making answers to your mtr probes a priority.  It's busy doing other stuff.

If, when doing an MTR, you see loss starting at hop X, and the same loss continues for each additional hop, then that suggests link congestion.  
(same thing goes for traceroute)
Assignee: network-operations → dcurado
Last Resolved: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.