Closed Bug 1570187 Opened 5 years ago Closed 4 years ago

please reimage mac-v2-signing8-13 as notarization boxes

Categories

(Infrastructure & Operations :: RelOps: Hardware, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozilla, Unassigned)

References

Details

Old hosts (from bug 1567235:

mac-v2-signing8.srv.releng.mdc1
mac-v2-signing9.srv.releng.mdc1
mac-v2-signing10.srv.releng.mdc1
mac-v2-signing11.srv.releng.mdc2
mac-v2-signing12.srv.releng.mdc2
mac-v2-signing13.srv.releng.mdc2

I've killed the signing servers on these hosts and performed a secrets wipe. We should image these for ronin-puppet, as

mac-v3-signing7.srv.releng.mdc1.mozilla.com
mac-v3-signing8.srv.releng.mdc1.mozilla.com
mac-v3-signing9.srv.releng.mdc1.mozilla.com
mac-v3-signing10.srv.releng.mdc2.mozilla.com
mac-v3-signing11.srv.releng.mdc2.mozilla.com
mac-v3-signing12.srv.releng.mdc2.mozilla.com

(Looks like we already have backfill for mac-v3-signing4.mdc2 in bug 1561117).

We also need to add these to the NATs in bug 1552305, and make sure they have outbound connectivity to Apple.

These hosts have been renamed, re-imaged and puppetized to the proper puppet role, including the backfill in bug 1561117. I've also filed bug 1570504 to have their IPs added to the autograph NAT policy.

All releng networks should allow full outbound connectivity to Apple since the firewall deny rule was removed awhile back.

Thank you!

mac-v3-signing9.srv.releng.mdc1.mozilla.com looks unreachable atm.
mac-v3-signing8.srv.releng.mdc1.mozilla.com may not have finished its puppet run. I killed puppet, configured it manually, but a) I had to create ~cltbld manually, and b) sudoers keeps getting overwritten.

7, 10, 11, 12, and 13 all look good and are live, thank you!

We're in much better shape with the 5 extra hosts, but it would be good to close this out.

mac-v3-signing8.srv.releng.mdc1 seems to still have SIP enabled, so I can't attempt to reimage it:

# csrutil status
System Integrity Protection status: enabled.
# /usr/sbin/bless --netboot --nextonly --server [redacted] && reboot
Could not set boot device property: 0xe00002e2
Can't set EFI

I can't reach mac-v3-signing9.srv.releng.mdc1.mozilla.com either. It'll need some onsite hands to investigate ?

Flags: needinfo?(jwatkins)

dhouse, could you do your magic on mac-v3-signing8.srv.releng.mdc1 to disable SIP ?

Flags: needinfo?(jwatkins)

(In reply to Nick Thomas [:nthomas] (UTC+12) from comment #4)

We're in much better shape with the 5 extra hosts, but it would be good to close this out.

mac-v3-signing8.srv.releng.mdc1 seems to still have SIP enabled, so I can't attempt to reimage it:

# csrutil status
System Integrity Protection status: enabled.
# /usr/sbin/bless --netboot --nextonly --server [redacted] && reboot
Could not set boot device property: 0xe00002e2
Can't set EFI

I can't reach mac-v3-signing9.srv.releng.mdc1.mozilla.com either. It'll need some onsite hands to investigate ?

Since mac-v3-signing8.srv.releng.mdc1 got imaged to mojave with SIP enabled, it will need to be disabled and reimaged from the recovery console. We'll need to file a ticket with QTS.

Same for mac-v3-signing9.srv.releng.mdc1 since it seems to be offline. My guess is it's asleep and might also have SIP enabled. I've already tried power-cycling it.

I've filed a ticket with QTS to disable SIP and reimage the 2 minis.

"""
Short description
Disable SIP and reiamge 2 mac minis from recovery console
Description
Please disable SIP and initiate a reimage from the recovery console for the 2 following minis: Asset Tag: 35245, Rack: IT42 - 15.2 Asset Tag: 44214, Rack: IT41 - 16.1 1. Reboot into recovery mode with: Option-⌘-R (option + command + r) 2. Open Terminal session: Utilities->Terminal 2. Disable SIP: csrutil disable 3. Trigger a reimage: /usr/sbin/bless --netboot --server bsdp://10.51.56.233 4. Reboot from terminal: reboot
"""

QTS reports one mini was successful and the other won't boot. Either way I can't reach either of them. At this point, I'll investigate these 2 minis when I go to MDC1.

"""
Please pardon the slight delay on our part for not providing this update to you sooner. We have completed 1 of 2 mac minis. We have successfully disabled SIP and initiated a reimage for Asset Tag 35245. We have made a total of 3 attempts to do the same with Asset Tag 44214; however, we continue receive no video output from the machine. We have verified the HDMI cable is firmly seated for Asset Tag 44214, at both ends of the machines. Please advise us if you would like for us to swap the cable on the machine or instead how you wish for us to proceed with the task. We are prepared to swap the cable for you and have staged another cable we can use which we have verified is a good known working cable. Thank you for allowing us to support your needs.
"""

Jake, for swapping power (https://bugzilla.mozilla.org/show_bug.cgi?id=1575615) I was not able to get #8, asset 35245, recovered with QTS (tried different power and all other cables and directly connecting the crashcart display, and holding power (10s) with the power cable disconnected to clear SMC (power savings)). Could you include this one, #8, in your review of #9?

Flags: needinfo?(jwatkins)
Depends on: 1577813

Jake recovered #8 and #9 when he visited MDC1 last Friday. He found that they needed firmware upgrades. I'm seeing deploystudio success mails for #8 re-peating the firmware upgrade successfully; I'm switching it to the signing reimage workflow.

(In reply to Dave House [:dhouse] from comment #10)

Jake recovered #8 and #9 when he visited MDC1 last Friday. He found that they needed firmware upgrades. I'm seeing deploystudio success mails for #8 re-peating the firmware upgrade successfully; I'm switching it to the signing reimage workflow.

#8 completed the reimage successfully: "The workflow 'Deploy Mojave Signing v3' was launched on the computer C07TQ095G1J2 (name: mac-v3-signing8, ip: 10.49.48.23, mac: a8:60:b6:39:b7:78) with a SUCCESSFUL termination status."

[dhouse@mac-v3-signing8.srv.releng.mdc1.mozilla.com ~]$ w
19:12  up 2 mins, 2 users, load averages: 4.04 3.34 1.50
USER     TTY      FROM              LOGIN@  IDLE WHAT
cltbld   console  -                19:10       2 -
dhouse   s000     10.49.48.101     19:12       - w
[dhouse@mac-v3-signing8.srv.releng.mdc1.mozilla.com ~]$ sw_vers
ProductName:    Mac OS X
ProductVersion: 10.14.5
BuildVersion:   18F132
[dhouse@mac-v3-signing8.srv.releng.mdc1.mozilla.com ~]$ csrutil status
System Integrity Protection status: disabled.
[dhouse@mac-v3-signing8.srv.releng.mdc1.mozilla.com code]$ cat /etc/puppet_role 
mac_v3_signing
[dhouse@mac-v3-signing8.srv.releng.mdc1.mozilla.com ~]$ git -C /etc/puppet/environments/production/code rev-parse HEAD
5cc237276154c064e1592f93fbea269d3efc6a51
Flags: needinfo?(jwatkins)

I'm doing the same for #9:

  1. confirmed firmware update success:
    The workflow 'Update Firmware' was launched on the computer C07T610MG1J2 (name: mac-v3-signing9, ip: 10.49.48.24, mac: a8:60:b6:24:f7:8e) with a SUCCESSFUL termination status.

  2. pinning to mojave signing workflow and rebooting

(In reply to Dave House [:dhouse] from comment #12)

I'm doing the same for #9:

  1. confirmed firmware update success:
    The workflow 'Update Firmware' was launched on the computer C07T610MG1J2 (name: mac-v3-signing9, ip: 10.49.48.24, mac: a8:60:b6:24:f7:8e) with a SUCCESSFUL termination status.

  2. pinning to mojave signing workflow and rebooting

I cycled the pdu power for #9 (I think it was waiting for a workflow section to be manually chosen on-screen).
https://inventory1.corpdmz.mdc1.mozilla.com/systems/show/38507/

Thanks for fixing up mac-v3-signing8.srv.releng.mdc1 - I've set that up and turned it on in prod.

How's mac-v3-signing9.srv.releng.mdc1 going ? Not responding to pings right now.

Thanks Nick for getting #8 up and checking on this!

I've asked the QTS remote hands to check and reimage mac-v3-signing9. When Jake was last there, he fixed the firmware, but it didn't come back after the reimages (https://bugzilla.mozilla.org/show_bug.cgi?id=1577813)

(In reply to Dave House [:dhouse] from comment #15)

Thanks Nick for getting #8 up and checking on this!

I've asked the QTS remote hands to check and reimage mac-v3-signing9. When Jake was last there, he fixed the firmware, but it didn't come back after the reimages (https://bugzilla.mozilla.org/show_bug.cgi?id=1577813)

QTS was not able to recover #9 (no video, reseated all cables and multiple power cycles; I'll ask them to try a new video cable)

(In reply to Dave House [:dhouse] from comment #16)

(In reply to Dave House [:dhouse] from comment #15)

Thanks Nick for getting #8 up and checking on this!

I've asked the QTS remote hands to check and reimage mac-v3-signing9. When Jake was last there, he fixed the firmware, but it didn't come back after the reimages (https://bugzilla.mozilla.org/show_bug.cgi?id=1577813)

QTS was not able to recover #9 (no video, reseated all cables and multiple power cycles; I'll ask them to try a new video cable)

I'm re-purposing a test worker to replace #9 as it could not be recovered with QTS's help.

Depends on: 1587592

Aki/Nick, with replacing mac-v3-signing9, we'll de-commission the previous machine (bug 1588852). For the old mini, what is the security process to de-comm it? I'm guessing DCOps has a process they've followed before, but I don't know it and want to confirm we're covered.

Depends on: 1588852
Flags: needinfo?(nthomas)
Flags: needinfo?(aki)

We need to wipe

  • /builds/scriptworker/{nightly,release}-signing.keychain
  • /builds/scriptworker/{scriptworker,script_config}.yaml
  • /Users/cltbld/ed25519_privkey

most likely with rm -P or some other command that overwrites the data multiple times before removing.

Are there ronin-puppet secrets populated on the macs? If so, we may need to wipe those as well.

Flags: needinfo?(aki)

(In reply to Aki Sasaki [:aki] (he/him) (UTC-7) from comment #19)

We need to wipe

  • /builds/scriptworker/{nightly,release}-signing.keychain
  • /builds/scriptworker/{scriptworker,script_config}.yaml
  • /Users/cltbld/ed25519_privkey

most likely with rm -P or some other command that overwrites the data multiple times before removing.

Are there ronin-puppet secrets populated on the macs? If so, we may need to wipe those as well.

Thanks! There are ronin-puppet secrets on there also.

I'll ask DCOps if they can wipe those locations or the full disk.

Flags: needinfo?(nthomas)

A secure erase of the whole disk would be best, given the SSD disk used in the mini and the snapshots that APFS makes.

Nick/Aki, the new mac-v3-signing9.srv.releng.mdc1.mozilla.com is reimaged and available. This is a re-purposed tester. Please let me know if you have any trouble with it.

Thanks Dave. Simon is just finishing up the ronin puppet changes so we'll use that to re-image mac-v3-signing9 and use it as a canary in prod.

I think we may be done here?

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.