Closed Bug 1290347 Opened 8 years ago Closed 8 years ago

tpe1 packet loss

Categories

(Infrastructure & Operations Graveyard :: NetOps: Other, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: mdevney, Assigned: dcurado)

Details

Attachments

(1 file)

<@nagios-scl3> (IRC) Thu 20:47:51 PDT [5465] admin1b.private.tpe1.mozilla.com:atd procs is CRITICAL: CHECK_NRPE: Socket timeout after 10 seconds.
                     (http://m.mozilla.org/atd+procs)

That's the first check.  Other things that also alerted down (and then up and then down etc.):
endpoint1.indigo.av.tpe1.mozilla.com
admin1a.private.tpe1.mozilla.com
fw1.ops.tpe1.mozilla.net
conserver1.df501-1.ops.tpe1.mozilla.net
wap515.ops.tpe1.mozilla.net
21:31 <@justdave> looks like the route is bouncing around a lot
21:31 <@justdave> reply from the 9th hope from me is coming from any of about 7 different IPs
21:31 <@justdave> 9th hop*
21:32 < jedi> I'm seeing a fair bit of packet loss to r4[01]02-s2.tp.hinet.net
21:32 <@justdave> yeah, those are the ones that I'm seeing bouncing around in the trace
21:33 <@justdave> I'm not getting any packet loss tracing there from my house
21:34 <@justdave> it's the link between zayo and hinet
21:34 <@justdave> from home I'm going through level3 on the way there, no packet loss
21:35 <@justdave> from scl3 it's going through zayo, and dropping packets like crazy
21:35 < jedi> Should we re-weight to prioritize l3?
21:35 <@justdave> the loss is inside zayo
21:36 <@justdave> I'd say yes.
Confirmed no office outage.

22:22 -!- Irssi: Join to #mozilla-taiwan was synced in 1 secs
22:22 < jedi> Hello all!  We (MOC) are seeing a bunch of alerts for various resources in tpe1.  Can anyone confirm network access there?
22:22 < jedi> Like, is there internet in the office?
22:25 < heycam> jedi: internet seems to be working here...
22:25 < heycam> jedi: wired and wifi
22:26 < jedi> Thanks for confirming.  Probably just that trip halfway round the planet.
Adding mtr output from two different non-mozilla carriers to public IP of admin1.tpe1.mozilla.com. ~50% packet loss at 118-163-10-188.HINET-IP.hinet.net
Note on the mtr output.
Notice that at one hop, a router in hinet, there is ~50% packet loss, but that at the *next* hop, there is no packet loss.  
This is "normal".  The router showing all that packet loss isn't necessarily dropping packets, it's just not making answers to your mtr probes a priority.  It's busy doing other stuff.

If, when doing an MTR, you see loss starting at hop X, and the same loss continues for each additional hop, then that suggests link congestion.  
(same thing goes for traceroute)
Assignee: network-operations → dcurado
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: