need to fix host sea-mini-osx64-1.community.scl3.mozilla.com

RESOLVED FIXED

Status

Infrastructure & Operations
NetOps
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: dcurado, Assigned: dcurado)

Tracking

Details

(Assignee)

Description

2 years ago
Looking at our list of network events going by in Splunk, the switch interface connected to the above mentioned host is flapping continuously.  

This log message is going by at a steady rate:
Sep 30 01:07:09 switch1.r401-10.ops.scl3.mozilla.net eswd[1012]: %-ESWD_STP_STATE_CHANGE_INFO: STP state for interface ge-1/0/38.0 context id 0 changed from LEARNING to FORWARDING

So...
dcurado@switch1.r401-10.ops.scl3.mozilla.net> show ethernet-switching table | match 38
  community         c8:2a:14:20:98:4b Learn          0 ge-1/0/38.0

then:
dcurado@fw1.ops.scl3.mozilla.net> show arp no-resolve | match c8:2a:14:20:98:4b
c8:2a:14:20:98:4b 63.245.223.80   reth0.20             none

then:
host 63.245.223.80 
80.223.245.63.in-addr.arpa domain name pointer sea-mini-osx64-1.community.scl3.mozilla.com

Inventory just says this is allocated to "community" -- anyone know how we find out more?
(Assignee)

Comment 1

2 years ago
Some contact info:

[08:58:38]  <arr>	http://www.seamonkey-project.org/about
[08:58:50]  <dcurado>	OK, I'll start digging there.  Thank you for the pointer!
[08:58:54]  <arr>	kairo is probably your best bet
Assignee: network-operations → dcurado
Status: NEW → ASSIGNED
(Assignee)

Comment 2

2 years ago
From Kairo:

Hi Dave,

Callek and ewong (that's also their IRC nicks) are the people dealing with SeaMonkey infrastructure nowadays. I know there are issues with the minis (I think the project only has two left and that's really tight), but not more than that.

Cheers,
KaiRo
(Assignee)

Comment 3

2 years ago
09:57:04]  <ewong>	dcurado: re: sea-mini-osx64-1  issue..
[09:57:07]  <dcurado>	yes
[09:57:10]  <ewong>	dcurado: can you clarify what you mean?
[09:57:16]  <dcurado>	yes
[09:57:37]  <dcurado>	the switch port that the mini is connected to is going up and down at a constant rate
[09:57:46]  <dcurado>	looks clearly broken to me
[09:58:08]  <dcurado>	However, I don't know if that's the mini, or the cable connecting it to the switch or what
[09:58:15]  <ewong>	dcurado: so is the switch busted or is the mini causing the switch busted?
[09:58:29] 	jib (jib@moz-j7m6lt.dyn.optonline.net) left IRC. (Connection closed)
[09:58:42]  <dcurado>	All the other ports on the switch appear fine, so I would *guess* the mini or the cable it uses to connect to the switch
[09:58:49]  <dcurado>	But I could be wrong.  It could be the switch port as well.
[09:58:51] 	jib (jib@moz-j7m6lt.dyn.optonline.net) joined the channel.
[09:59:07]  <dcurado>	I wanted to make sure that this server is actually doing something.
[09:59:17]  <ewong>	give me a sec.. lemme see if  can see what's going on with the mini..  
[09:59:26]  <dcurado>	Then, if so, ask if its OK if we dork with it to try to remedy the issue.
[09:59:28]  <dcurado>	OK, thanks!
[10:00:27]  <ewong>	dcurado: well, from what I see, it looks like it's building something..
[10:00:38]  <dcurado>	OK
[10:00:43]  <ewong>	dcurado: lemme check more
[10:00:50]  <dcurado>	Thanks again
[10:02:28]  <ewong>	dcurado: well so far, it looks ok.  
[10:02:38]  <ted>	catlee: log scraping is full of exciting ways to break shit
[10:02:44]  <dcurado>	Hrm.  OK.  Thanks for checking.
[10:02:47] 	armenzg is now known as armenzg_brb
[10:02:58]  <ewong>	but since I'm also viewing it via ssh as well as the master.  I'm probably missing something
[10:03:14]  <dcurado>	I was trying to pull up the total number of times it has flapped in the past 24 hours, and it seems to have quieted down over night.
[10:03:16]  <ewong>	dcurado: would switching a cable help testing it?
[10:03:23] 	AutomatedTester|AFK is now known as AutomatedTester
[10:03:41]  <dcurado>	So maybe it was a temp issue... (although, I know that doesn't happen often)
[10:03:59] 	JoeS1 (Thunderbird@moz-r0uivm.east.verizon.net) left IRC. (Client exited)
[10:04:07]  <dcurado>	For now, please ignore.  If it acts up again, I'll come back to you and we'll see if we can plan a time to test it, OK?
[10:04:18]  <ewong>	dcurado: well.. thanks for monitoring it..  if it does flap, please ping me or Callek
[10:04:30]  <dcurado>	Thank you very much for getting back to me on this.
(Assignee)

Comment 4

2 years ago
This port is still flapping quite a bit, but pinging the host shows no loss in connectivity.
I suspect the ethernet hardware on the mac mini is failing, but resetting itself quickly so there are no
packets lost.  It's just a "best guess" as to what is causing this issue.
(Assignee)

Comment 5

2 years ago
Van -- next time you are SCL3, can you check the cable that connects this mac mini?
Maybe it's loose and causing this issue.  Kind of a long shot, but would like to do
something about this.

Thanks, 
Dave
Flags: needinfo?(vle)

Comment 6

2 years ago
network cable replaced.
Flags: needinfo?(vle)
(Assignee)

Comment 7

2 years ago
This appears to have resolved the problem.
Thanks Van.
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.