Closed
Bug 1379881
Opened 7 years ago
Closed 7 years ago
sea-mini-osx64-1 is awol
Categories
(Infrastructure & Operations :: DCOps, task)
Infrastructure & Operations
DCOps
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ewong, Assigned: van)
References
Details
Attachments
(1 file)
1.05 KB,
patch
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #1376192 +++ Sea-mini-osx64-1 is back to being awol. Since this system has been AWOL far longer than it has worked, I'm starting to believe some sort of imminent hardware failure. Can someone do a checkdisk(or whatever the equivalent of a check disk is on an OSX system)? I have removed it from the build pool so it can be operated on without any concern of it getting jobs. Thanks!
Assignee | ||
Comment 1•7 years ago
|
||
running verify disk on the mini but it's fluctuating between 15min - 5 hours to complete. will check again in a few hours and hopefully it'll allow us to run repair disk.
Assignee: server-ops-dcops → vle
Assignee | ||
Comment 2•7 years ago
|
||
verify disk completed, running repair disk. will check back on this tomorrow.
Assignee | ||
Comment 3•7 years ago
|
||
repair disk finished, it didn't report any issues and the drive is OK. i tried running diagnostics on this mini as well but it wont read any of our "Applications Install Disc 2” (tried 4 discs) in its DVD drive. the host is back online and i don't have a login. is it a script or something locking up the host? are there any available logs to help you out? it didn't crash at all when i left it running overnight, running its repair. [vle@jump1.community.scl3 ~]$ fping sea-mini-osx64-1.community.scl3.mozilla.com sea-mini-osx64-1.community.scl3.mozilla.com is alive [vle@jump1.community.scl3 ~]$ ssh sea-mini-osx64-1.community.scl3.mozilla.com The authenticity of host 'sea-mini-osx64-1.community.scl3.mozilla.com (63.245.223.80)' can't be established. RSA key fingerprint is e7:19:ba:03:b7:5b:02:8a:7a:0d:e5:7d:a6:8b:2a:35. Are you sure you want to continue connecting (yes/no)?
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 4•7 years ago
|
||
(In reply to Van Le [:van] from comment #3) > repair disk finished, it didn't report any issues and the drive is OK. i > tried running diagnostics on this mini as well but it wont read any of our > "Applications Install Disc 2” (tried 4 discs) in its DVD drive. the host is > back online and i don't have a login. > > is it a script or something locking up the host? are there any available > logs to help you out? it didn't crash at all when i left it running > overnight, running its repair. > It seems to happen sporadically(if that's even the right word to use). It seems to happen during cloning, though not sure if it has anything to do with it. Right now, I'm still having some difficulty in ssh into it even though ping works.
Reporter | ||
Comment 5•7 years ago
|
||
cannot ssh into this. Callek, any idea?
Status: RESOLVED → REOPENED
Flags: needinfo?(bugspam.Callek)
Resolution: FIXED → ---
Comment 6•7 years ago
|
||
Just a thought: did you run an HD surface scan and a full memory test? I'm a Mac sysadmin at work and this would be the first thing I'd try with a system freezing or crashing randomly. However, I'm not aware of free tools performing hard disks surface scans, I usually use Micromat's TechTool Pro that check RAM and internal sensors, too. Please tell me if I can help in any way.
Reporter | ||
Comment 7•7 years ago
|
||
(In reply to Andrea Govoni from comment #6) > Just a thought: did you run an HD surface scan and a full memory test? > I'm a Mac sysadmin at work and this would be the first thing I'd try with a > system freezing or crashing randomly. > However, I'm not aware of free tools performing hard disks surface scans, I > usually use Micromat's TechTool Pro that check RAM and internal sensors, > too. Please tell me if I can help in any way. I'm not sure as I don't have physical access to this system. But judging from comment #3, it seems to be ok. :van, can you reboot this system again? Thanks
Reporter | ||
Comment 8•7 years ago
|
||
Assignee | ||
Comment 9•7 years ago
|
||
yah we did a surface scan, no issues. i also did the apple diagnostics which tests sensors/memory and it didn't come up with any issues. there could be underlying problems that the diagnostics can't pick up though. hosts kicked and is back online. [vle@jump1.community.scl3 ~]$ ping sea-mini-osx64-1.community.scl3.mozilla.com PING sea-mini-osx64-1.community.scl3.mozilla.com (63.245.223.80) 56(84) bytes of data. 64 bytes from sea-mini-osx64-1.community.scl3.mozilla.com (63.245.223.80): icmp_seq=1 ttl=64 time=0.752 ms ^C --- sea-mini-osx64-1.community.scl3.mozilla.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 0.752/0.752/0.752/0.000 ms [vle@jump1.community.scl3 ~]$ ssh !$ ssh sea-mini-osx64-1.community.scl3.mozilla.com The authenticity of host 'sea-mini-osx64-1.community.scl3.mozilla.com (63.245.223.80)' can't be established. RSA key fingerprint is e7:19:ba:03:b7:5b:02:8a:7a:0d:e5:7d:a6:8b:2a:35. Are you sure you want to continue connecting (yes/no)?
Reporter | ||
Comment 10•7 years ago
|
||
Thanks Van! It's back up!
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Flags: needinfo?(bugspam.Callek)
Resolution: --- → FIXED
Reporter | ||
Comment 11•7 years ago
|
||
and it's back to being frozen.. You know. I think it's something to do with either memory or hg or harddisk. Right now, it's at hg update for the past 5 hrs... and I'm guessing it's this line: /usr/local/bin/hg clone --verbose --noupdate https://hg.mozilla.org/releases/comm-esr52 build And I know for a fact that it shouldn't take 4 hrs+ to clone comm-esr52, so it's pretty much hung during cloning. hd failure or memory screw up or something else. So I'm gonna take it back off the builder list. Callek: my original question still stands.. what should we do?
Status: RESOLVED → REOPENED
Flags: needinfo?(bugspam.Callek)
Resolution: FIXED → ---
Reporter | ||
Comment 12•7 years ago
|
||
and now.. I can't even ssh into it but it does respond to pings.
Reporter | ||
Comment 13•7 years ago
|
||
(In reply to Edmund Wong (:ewong) from comment #12) > and now.. I can't even ssh into it but it does respond to pings. or it's taking an inordinate amount of time to allow me to log in.. going to leave this ssh seabld@sea-mini-osx64-1 and see if it connects.
Comment 14•7 years ago
|
||
(In reply to Edmund Wong (:ewong) from comment #11) > and it's back to being frozen.. > > Callek: my original question still stands.. what should we do? If this were a MoCo machine I'd say decomm (after hardware diags don't turn up anything) I suspect either the disk is literally dying, the memory is dying, or both (its also not unheard of for the entire motherboard to go in this hardware after this long) I don't have good advice on "what to do" -- I might sometimes recommend reimage, but iirc we don't have a good way to do so here, nor a good way to bring it back up from a fresh OS install...
Flags: needinfo?(bugspam.Callek)
Reporter | ||
Comment 15•7 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #14) > (In reply to Edmund Wong (:ewong) from comment #11) > > and it's back to being frozen.. > > > > Callek: my original question still stands.. what should we do? > > If this were a MoCo machine I'd say decomm (after hardware diags don't turn > up anything) I suspect either the disk is literally dying, the memory is > dying, or both (its also not unheard of for the entire motherboard to go in > this hardware after this long) > > I don't have good advice on "what to do" -- I might sometimes recommend > reimage, but iirc we don't have a good way to do so here, nor a good way to > bring it back up from a fresh OS install... Thanks for the advice.. since -1 isn't helping at all, there's little point in keeping it around. Unfortunately, we don't have any replacements and we're one osx64 short of a complete platform miss since -2 and -4 are, in essence, decomissioned (though not physically I think, right :van? sea-mini-osx64-2 and 4 are still in production right?)
Flags: needinfo?(vle)
Assignee | ||
Comment 16•7 years ago
|
||
they are still in the rack and have not been unplugged/removed.
Flags: needinfo?(vle)
Assignee | ||
Comment 17•7 years ago
|
||
:ewong, what are the plans for these minis after SCL3? do you plan to buy/install new ones into the new data centers or are you guys using a 3rd party like Mac Stadium?
Reporter | ||
Comment 18•7 years ago
|
||
(In reply to Van Le [:van] from comment #17) > :ewong, what are the plans for these minis after SCL3? do you plan to > buy/install new ones into the new data centers or are you guys using a 3rd > party like Mac Stadium? We're still considering both options though not sure about whether the higher ups want to have 'community'-based stuff in the new data centre. Thanks!
Assignee | ||
Updated•7 years ago
|
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•