Status

Infrastructure & Operations
NetOps: Other
RESOLVED WORKSFORME
2 years ago
2 years ago

People

(Reporter: jedi, Assigned: dcurado)

Tracking

Details

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
<@nagios-scl3> (IRC) Thu 20:47:51 PDT [5465] admin1b.private.tpe1.mozilla.com:atd procs is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
                     (http://m.mozilla.org/atd+procs)

That's the first check.  Other things that also alerted down (and then up and then down etc.):
endpoint1.indigo.av.tpe1.mozilla.com
admin1a.private.tpe1.mozilla.com
fw1.ops.tpe1.mozilla.net
conserver1.df501-1.ops.tpe1.mozilla.net
wap515.ops.tpe1.mozilla.net
(Reporter)

Comment 1

2 years ago
21:31 <@justdave> looks like the route is bouncing around a lot
21:31 <@justdave> reply from the 9th hope from me is coming from any of about 7 different IPs
21:31 <@justdave> 9th hop*
21:32 < jedi> I'm seeing a fair bit of packet loss to r4[01]02-s2.tp.hinet.net
21:32 <@justdave> yeah, those are the ones that I'm seeing bouncing around in the trace
21:33 <@justdave> I'm not getting any packet loss tracing there from my house
21:34 <@justdave> it's the link between zayo and hinet
21:34 <@justdave> from home I'm going through level3 on the way there, no packet loss
21:35 <@justdave> from scl3 it's going through zayo, and dropping packets like crazy
21:35 < jedi> Should we re-weight to prioritize l3?
21:35 <@justdave> the loss is inside zayo
21:36 <@justdave> I'd say yes.
(Reporter)

Comment 2

2 years ago
Looks like recurrence of https://bugzilla.mozilla.org/show_bug.cgi?id=1289878
(Reporter)

Comment 3

2 years ago
Confirmed no office outage.

22:22 -!- Irssi: Join to #mozilla-taiwan was synced in 1 secs
22:22 < jedi> Hello all!  We (MOC) are seeing a bunch of alerts for various resources in tpe1.  Can anyone confirm network access there?
22:22 < jedi> Like, is there internet in the office?
22:25 < heycam> jedi: internet seems to be working here...
22:25 < heycam> jedi: wired and wifi
22:26 < jedi> Thanks for confirming.  Probably just that trip halfway round the planet.
Created attachment 8775886 [details]
admin1.tpe1.mozilla.com-traceroute.txt

Adding mtr output from two different non-mozilla carriers to public IP of admin1.tpe1.mozilla.com. ~50% packet loss at 118-163-10-188.HINET-IP.hinet.net
(Assignee)

Comment 5

2 years ago
Note on the mtr output.
Notice that at one hop, a router in hinet, there is ~50% packet loss, but that at the *next* hop, there is no packet loss.  
This is "normal".  The router showing all that packet loss isn't necessarily dropping packets, it's just not making answers to your mtr probes a priority.  It's busy doing other stuff.

If, when doing an MTR, you see loss starting at hop X, and the same loss continues for each additional hop, then that suggests link congestion.  
(same thing goes for traceroute)
Assignee: network-operations → dcurado
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.