Closed Bug 1577813 Opened 6 years ago Closed 6 years ago

mac minis to physically check

Categories

(Infrastructure & Operations :: RelOps: Posix OS, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dhouse, Assigned: dividehex)

References

Details

User Story

host serial asset rack shelf notes/bug?

signing(notarization) minis:
mac-v3-signing8	C07TQ095G1J2	35245	IT42	15.20 (https://bugzilla.mozilla.org/show_bug.cgi?id=1575615#c11)
 * Firmware not upgraded.  Currently: MM71.022.B14  Needs to be: MM71.0232.B00

mac-v3-signing9	C07T610MG1J2	44214	IT41	16.10 (https://bugzilla.mozilla.org/show_bug.cgi?id=1570187#c8)
 * Firmware not upgraded.  Currently: MM71.022.B08  Needs to be: MM71.0232.B00

testers:
257	C07RJ11CG1J2	03962	IT44	14.10
* still running yosemite even thought hostname was changed to mojave.  reimaged to mojave
262	C07RJ0X9G1J2	03967	IT44	16.2
* hung.  rebooted via power button
272	C07RJ0YPG1J2	03977	IT44	26.2
* dead. no power up

336	C07RJ10UG1J2	02984	IT40	5.2
* powered off
339	C07RJ0ZSG1J2	02987	IT40	7.1
* powered off; would not enter recovery, forced to create user in order to bless and reimage. must press power from the rear
340	C07RJ11GG1J2	02988	IT40	7.2
* powered off; would not enter recovery, forced to create user in order to bless and reimage. must press power from the rear
356	C07RJ133G1J2	18463	IT40	17.2
* powered off; would not enter recovery, forced to create user in order to bless and reimage. 
364	C07RJ10CG1J2	18471	IT40	26.20
powered off,  no video. need more trouble shooting
369	C07RJ11EG1J2	18476	IT40	30.1	

315	C07RJ13AG1J2	02963	IT41	19.1
* reimaged (took longer than normal)

461	C07SQ0PCG1J2	15745	IT41	4.10  blessed to reimage as staging but didn't come back
 * bad switch port?  The port seemed dead with no light and the os report cable not connected.  I moved the cable to another port it seemed to connect but ssh was really spoty.

~~462	C07SQ0QHG1J2	15746	IT41	4.20~~  good staging; for comparison
463	C07SQ0RAG1J2	15747	IT41	5.10  went offline after multiuser test (softwareupdate in log)
464	C07SQ0NUG1J2	15748	IT41	5.20  went offline after multiuser test (softwareupdate in log)
465	C07SQ0RNG1J2	15749	IT41	6.10  went offline after multiuser test (softwareupdate in log)
* All 3 multiuser were in a reboot loop.  I entered recovery and reimaged.

468	C07SQ0QMG1J2	15752	IT41	7.2
* completely dead; no power up

mac minis to physically inspect and troubleshoot.

User Story: (updated)
User Story: (updated)
Assignee: nobody → jwatkins
Blocks: 1570187
User Story: (updated)
User Story: (updated)

I checked over all of the mdc1 minis again, I was able to bring up 264, kicked off reimaging of 257, and added 257 and 364 to the list. I'll remove 257 if it succeeds in reimaging.

User Story: (updated)

I kicked off a reimage of #256 to make sure deploystudio is working on install2 (since it cycled for the power maintenance wednesday):
256 C07RJ134G1J2 03961 mdc1 IT44 12.20

I haven't seen a deploystudio email yet. bsdpy was not affected by the power (uptime is around 1 year on there)

(In reply to Dave House [:dhouse] from comment #2)

I kicked off a reimage of #256 to make sure deploystudio is working on install2 (since it cycled for the power maintenance wednesday):
256 C07RJ134G1J2 03961 mdc1 IT44 12.20

I haven't seen a deploystudio email yet. bsdpy was not affected by the power (uptime is around 1 year on there)

The reimages are failing with the last log entries:

2019-08-30 19:09:08.133 DeployStudio Runtime.bin[360:18919] Network address: 10.49.56.196 (t-mojave-r7-256.test.releng.mdc1.mozilla.com)
2019-08-30 19:09:08.155 DeployStudio Runtime.bin[360:18919] Network interface speed: AUTOSELECT (1000BASET <FULL-DUPLEX,FLOW-CONTROL>)
2019-08-30 19:09:08.176 DeployStudio Runtime.bin[360:18919] Operating System: Mac OS X Version 10.13.6 (Build 17G65)
2019-08-30 19:09:08.177 DeployStudio Runtime.bin[360:18919] Date: 19/08/30 12:09:08
2019-08-30 19:09:08.177 DeployStudio Runtime.bin[360:18919] ====================================================================================================
2019-08-30 19:09:09.047 DeployStudio Runtime.bin[360:18919] 24 plugins were successfully loaded!
2019-08-30 19:09:12.174 DeployStudio Runtime.bin[360:51587] The user 'dsadmin' was successfully authenticated.
2019-08-30 19:09:12.283 DeployStudio Runtime.bin[360:18919] Connected to server install2.test.releng.mdc1.mozilla.com (1.7.8)
2019-08-30 19:09:12.424 DeployStudio Runtime.bin[360:52105] Checking server reachability (server=install2.test.releng.mdc1.mozilla.com port=445) ...
2019-08-30 19:09:13.589 DeployStudio Runtime.bin[360:52105] Checking server reachability (server=10.49.56.17 port=445) ...

So, something isn't responding/connecting (on 445?). I started deploystudio after the power maintenance, but I must not have started everything (or correctly?).

So, something isn't responding/connecting (on 445?). I started deploystudio after the power maintenance, but I must not have started everything (or correctly?).

I restarted the file server on install2 mdc1. I think that cleared up the issue. I've been able to re-image since.

User Story: (updated)
User Story: (updated)
User Story: (updated)
User Story: (updated)
User Story: (updated)
User Story: (updated)

We have a repeated deploystudio success mail over the weekend like:
The workflow 'Update Firmware' was launched on the computer C07TQ095G1J2 (name: mac-v3-signing8, ip: 10.49.48.23, mac: a8:60:b6:39:b7:78) with a SUCCESSFUL termination status. This mail was generated automatically by DeployStudio Server. -- The DeployStudio Team.

So I'll check it later today. I think I can set the next workflow for this machine (in the deploystudio database) as the mojave reimage and then it will run that instead of repeating the firmware update.

Flags: needinfo?(dhouse)

I moved #8 over to the signing workflow and that completed. I am doing the same for #9, but I may need to ask QTS to reboot and bless it (deploystudio said it was waiting at the workflow selector prompt).

I asked QTS to check the remaining problem minis in MDC1:

Rack Shelf Asset Host
IT40 5.2 02984 336
IT40 26.2 18471 364
IT40 30.1 18476 369

IT41 4.1 15745 461
IT41 7.2 15752 468
IT41 16.10 44214 mac-v3-s9

IT44 26.2 03977 272

Flags: needinfo?(dhouse)

:dhouse, what is the status of these minis? Did QTS finish investigating them?

Flags: needinfo?(dhouse)

yes, this is out of date. One had dead video and was replaced with an r8.
Current state is in the spreadsheet. I'll double-check over them and make sure I don't have other abandoned bugs for them.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Flags: needinfo?(dhouse)
You need to log in before you can comment on or make changes to this bug.