Status

Infrastructure & Operations
DCOps
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: Callek, Unassigned)

Tracking

Details

(Reporter)

Description

5 years ago
Wed 07:50:25 PST [494] panda-relay-060.p7.releng.scl1.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%

Not sure if this qualifies as DCOps or RelOps, filing here first.

Either way, this is important for panda health.
(Reporter)

Comment 1

5 years ago
...and its back:
Wed 07:53:14 PST [496] panda-relay-060.p7.releng.scl1.mozilla.com is UP :PING OK - Packet loss = 0%, RTA = 6.53 ms

Leaving open for explanation/investigation
(Reporter)

Comment 2

5 years ago
And its actually been flapping quite a bit today:

[02:26:19]	nagios-releng	Mon 23:26:27 PST [471] panda-relay-060.p7.releng.scl1.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%
[02:31:39]	nagios-releng	Mon 23:31:47 PST [472] panda-relay-060.p7.releng.scl1.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%
[02:36:49]	nagios-releng	Mon 23:36:57 PST [473] panda-relay-060.p7.releng.scl1.mozilla.com is UP :PING OK - Packet loss = 0%, RTA = 51.77 ms
[08:59:15]	nagios-releng	Wed 05:59:34 PST [478] panda-relay-060.p7.releng.scl1.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%
[09:04:34]	nagios-releng	Wed 06:04:54 PST [480] panda-relay-060.p7.releng.scl1.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%
[09:07:54]	nagios-releng	Wed 06:08:14 PST [482] panda-relay-060.p7.releng.scl1.mozilla.com is UP :PING OK - Packet loss = 0%, RTA = 3.26 ms
[10:34:04]	nagios-releng	Wed 07:34:25 PST [490] panda-relay-060.p7.releng.scl1.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%
[10:39:23]	nagios-releng	Wed 07:39:44 PST [492] panda-relay-060.p7.releng.scl1.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%
[10:44:44]	nagios-releng	Wed 07:45:04 PST [493] panda-relay-060.p7.releng.scl1.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%
[10:50:04]	nagios-releng	Wed 07:50:25 PST [494] panda-relay-060.p7.releng.scl1.mozilla.com is DOWN :PING CRITICAL - Packet loss = 100%
[10:52:53]	nagios-releng	Wed 07:53:14 PST [496] panda-relay-060.p7.releng.scl1.mozilla.com is UP :PING OK - Packet loss = 0%, RTA = 6.53 ms
I assume the attached Pandas are disabled for the time being, and we have lots of spare capacity for the moment.

dcops, hopefully this is something obvious from looking at the device - loose network cable, etc.
Severity: critical → normal

Updated

5 years ago
colo-trip: --- → scl1
I've rebooted the digi connect module on this relay since it was also running at 100% cpu (bug 834746).  I'm going r/f this for now but please re-open if it goes down or begins to flap again.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
It's down again.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 6

5 years ago
I've swapped out the ethernet cable for the relay.  Let's see if that makes a difference.

Comment 7

5 years ago
Relay is still up today. Please reopen if issues persist.

[vle@natasha ~]$ /usr/sbin/fping panda-relay-060.p7.releng.scl1.mozilla.com
panda-relay-060.p7.releng.scl1.mozilla.com is alive
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago5 years ago
Resolution: --- → FIXED
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.