Closed Bug 1541565 Opened 7 years ago Closed 4 years ago

Set up mac minis for notarization

Categories

(Infrastructure & Operations :: RelOps: Posix OS, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dhouse, Assigned: dhouse)

References

Details

User Story

- [x] [Netops] Have netops change the corresponding network ports to the srv vlan in both mdc1 and mdc2
- [x] [Relops] Change the dns/dhcp in inblox to new mac-v3-signing{1..5}.srv.releng.mdc[12].mozilla.com fqdns
- [x] [Relops] Setup a dedicated deploystudio workflow for mac-v3-signing on both install2 ds hosts
- [x] [Relops] Add a new mac-v3-signing puppet role in ronin puppet
- [x] [Relops] Bless a single host in mdc1 to make sure it completes a full deployment and doesn't go offline
- [x] [Relops] Bless the rest of the new mac-v3-signing hosts (MDC1)
- [X ] [Relops] Bless the MDC2 hosts (fix imaging)

Attachments

(1 file)

We need 5 mac minis allocated in each datacenter (mdc1, mdc2) and set up for Nick and Aki to use for notarization.

Aki/Nick can these be running yosemite, or do they need to be on mojave? Also, can these be in the test vlan and what hostnames would you like? (or I can keep them with t-yosemite-r7-NNN.test.releng.mdcN.mozilla.com)

Flags: needinfo?(nthomas)
Flags: needinfo?(aki)
Assignee: nobody → dhouse

These need to be running mojave. I know masterless puppet isn't ready yet; we upgraded partner-repack-1.mdc1 via vnc and system update =\

We probably want these in the signing vlan or otherwise in a restricted area isolated from the testers.

Naming... we're thinking maybe mac-v3-signingXX or something?

Flags: needinfo?(aki)

Jake, would you like for me to manually upgrade these or use a deploystudio image? (does install2 have a mojave image?)

I'm thinking to take the 10 at our tester dc numbering break (last 5 in mdc2, first 5 in mdc1):

C07RJ12LG1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-237.test.releng.mdc1.mozilla.com 10.49.56.177
C07RJ0Z9G1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-238.test.releng.mdc1.mozilla.com 10.49.56.178
C07RJ110G1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-239.test.releng.mdc1.mozilla.com 10.49.56.179
C07RJ10MG1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-240.test.releng.mdc1.mozilla.com 10.49.56.180
C07RJ0ZKG1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-241.test.releng.mdc1.mozilla.com 10.49.56.181

C07RJ0Z8G1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-232.test.releng.mdc2.mozilla.com 10.51.56.48
C07RJ13EG1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-233.test.releng.mdc2.mozilla.com 10.51.56.49
C07RJ0YXG1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-234.test.releng.mdc2.mozilla.com 10.51.56.50
C07RJ0YTG1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-235.test.releng.mdc2.mozilla.com 10.51.56.51
C07RJ11TG1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-236.test.releng.mdc2.mozilla.com 10.51.56.52

Infoblox shows we have ips available in the vlan with the signing servers (srv):
10.49.48.0/24 srv.releng.mdc1.mozilla.com 43.4% (20/46)
10.51.48.0/24 MDC2 Releng srv.releng.mdc2.mozilla.com 54.5% (18/33)

So I'll plan to move these 10 over there and change hostnames to mac-v3-signing{1..5}.srv.releng.mdc{1,2}.mozilla.com

Flags: needinfo?(jwatkins)

(In reply to Dave House [:dhouse] from comment #3)

I'm thinking to take the 10 at our tester dc numbering break (last 5 in mdc2, first 5 in mdc1):

We already took the last 2 in mdc2 for pgo builds (235, 236; https://bugzilla.mozilla.org/show_bug.cgi?id=1530732). So I'll pick a different range.

(In reply to Dave House [:dhouse] from comment #4)

(In reply to Dave House [:dhouse] from comment #3)

I'm thinking to take the 10 at our tester dc numbering break (last 5 in mdc2, first 5 in mdc1):

We already took the last 2 in mdc2 for pgo builds (235, 236; https://bugzilla.mozilla.org/show_bug.cgi?id=1530732). So I'll pick a different range.

I'll take the first 5 in each dc instead:

C07RJ12LG1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-237.test.releng.mdc1.mozilla.com 10.49.56.177
C07RJ0Z9G1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-238.test.releng.mdc1.mozilla.com 10.49.56.178
C07RJ110G1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-239.test.releng.mdc1.mozilla.com 10.49.56.179
C07RJ10MG1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-240.test.releng.mdc1.mozilla.com 10.49.56.180
C07RJ0ZKG1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-241.test.releng.mdc1.mozilla.com 10.49.56.181

C07Q73HFG1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-001.test.releng.mdc2.mozilla.com 10.51.56.234
C07Q73J6G1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-002.test.releng.mdc2.mozilla.com 10.51.56.235
C07Q73N7G1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-003.test.releng.mdc2.mozilla.com 10.51.56.236
C07Q73MVG1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-004.test.releng.mdc2.mozilla.com 10.51.56.237
C07Q73G6G1J2 MM71.0232.B00 2.24f32 t-yosemite-r7-005.test.releng.mdc2.mozilla.com 10.51.56.238

CC'ing CIDuty: I marked these 10 mac workers as ignore in the tracking spreadsheet. I have not taken them yet, but that is my plan in the next day or two.
t-yosemite-r7-{001..005}
t-yosemite-r7-{237..241}

Flags: needinfo?(nthomas)

(In reply to Dave House [:dhouse] from comment #3)

Jake, would you like for me to manually upgrade these or use a deploystudio image? (does install2 have a mojave image?)

Let's use deploystudio and masterless puppet (at least to get the hosts to a bare configuration state). I'll need to manually copy over files from install2 in mdc1 to the install2 in mdc2 and then set up a DS workflow for these hosts. We'll also need to assemble a new puppet role and reuse any profiles that have already been written for mojave gw testers.

Flags: needinfo?(jwatkins)

dhouse, can you follow up on having netops move these hosts to the srv vlan networks? I think that is the blocker atm.

The sequence I see here (with owners):

  • [Netops} Have netops change the corresponding network ports to the srv vlan in both mdc1 and mdc2
  • [Relops] Change the dns/dhcp in inblox to new mac-v3-signing.srv.releng.mdc[12].mozilla.com fqdns
  • [Jake] Setup a dedicated deploystudio workflow for mac-v3-signing on both install2 ds hosts
    • This is mostly hand crafted at the moment since install2 hosts are not run high sierra and aren't puppet managed
    • We might need to have netops add the bsdpy and install2 hosts to the ds firewall rules in order to get cross vlan netboot/ds to work (this has always been a PITA)
  • [Relops] Add a new mac-v3-signing puppet role in ronin puppet
    • Role will contain the basic profiles (ntp, users, etc...)
    • Write a 'signing' admins profile to include aki, nthomas, jwatkins... (aka the shortlist). This will grant ssh and sudo access.
  • [Jake] Bless a single host in mdc1 to make sure it completes a full deployment and doesn't go offline
  • [Relops] Bless the rest of the new mac-v3-signing hosts

This should give Nick and Aki a base to start setting up the notarization/signing pool.

Flags: needinfo?(dhouse)
Depends on: 1542379

I submitted the request to NetOps to move the 10 macs.
I'll change them in infoblox when that is completed (and pick the next ips available in the srv ranges)

The first two are completed. I think #3 will verify they are working.

(In reply to Jake Watkins [:dividehex] from comment #8)

dhouse, can you follow up on having netops move these hosts to the srv vlan networks? I think that is the blocker atm.

The sequence I see here (with owners):

  • [x] [Netops} Have netops change the corresponding network ports to the srv vlan in both mdc1 and mdc2
  • [x] [Relops] Change the dns/dhcp in inblox to new mac-v3-signing.srv.releng.mdc[12].mozilla.com fqdns
  • [ ] [Jake] Setup a dedicated deploystudio workflow for mac-v3-signing on both install2 ds hosts
  • [ ] [Relops] Add a new mac-v3-signing puppet role in ronin puppet
  • [ ] [Jake] Bless a single host in mdc1 to make sure it completes a full deployment and doesn't go offline
  • [ ] [Relops] Bless the rest of the new mac-v3-signing hosts
Flags: needinfo?(dhouse)

earlier today, Jake walked me through the deploystudio configuration and scripts he has written for mojave. We've created a new workflow for the mac-v3-signing.

I quarantined t-yosemite-r7-241 (renamed to mac-v3-signing5.srv.releng.mdc1.mozilla.com,10.49.48.181) and blessed it and rebooted

bsdpy host shows the request:

server_1  | 04/10/2019 09:42:39 PM - DEBUG: Got BSDP INFORM[LIST] packet: 
server_1  | 04/10/2019 09:42:39 PM - DEBUG: Determining image list for system ID Macmini7,1
server_1  | 04/10/2019 09:42:39 PM - DEBUG: Image "DSR-10136" has no restrictions, adding to list
server_1  | 04/10/2019 09:42:39 PM - DEBUG: Found default image ID 1136
server_1  | 04/10/2019 09:42:39 PM - DEBUG: -=========================================-
server_1  | 04/10/2019 09:42:39 PM - DEBUG: Return ACK[LIST] to 10.49.48.181 on 68
server_1  | 04/10/2019 09:42:39 PM - DEBUG: Default boot image ID: [129, 0, 4, 112]
server_1  | 04/10/2019 09:42:42 PM - DEBUG: -=========================================-
server_1  | 04/10/2019 09:42:42 PM - DEBUG: Got BSDP INFORM[LIST] packet: 
server_1  | 04/10/2019 09:42:42 PM - DEBUG: Determining image list for system ID Macmini7,1
server_1  | 04/10/2019 09:42:42 PM - DEBUG: Image "DSR-10136" has no restrictions, adding to list
server_1  | 04/10/2019 09:42:42 PM - DEBUG: Found default image ID 1136
server_1  | 04/10/2019 09:42:42 PM - DEBUG: -=========================================-
server_1  | 04/10/2019 09:42:42 PM - DEBUG: Return ACK[LIST] to 10.49.48.181 on 68
server_1  | 04/10/2019 09:42:42 PM - DEBUG: Default boot image ID: [129, 0, 4, 112]
User Story: (updated)
User Story: (updated)

I don't see the test machine hitting the deploystudio server/logs. So I think something is failing in bsdpy or with the network.

I've confirmed the connection for initial netboot (log in previous entry).
And I've confirmed tftp:69 works by manually trying to download a non-existing file and seeing the failure finding the file on the server:

tftpd_1   | Apr 10 22:24:52 da0bf25e87f0 atftpd[1.140652426065664]: Serving tmpfile2 to 10.49.48.181:58858
tftpd_1   | Apr 10 22:24:52 da0bf25e87f0 atftpd[1.140652426065664]: File /nbi/tmpfile2 not found

I've also verified the http connection works for downloading the netboot image:

httpd_1   | 2019/04/10 22:28:00 [error] 12#12: *21770 open() "/nbi/tmptest" failed (2: No such file or directory), client: 10.49.48.181, server: localhost, request: "GET /tmptest HTTP/1.1", host: "10.49.56.200"
httpd_1   | 10.49.48.181 - - [10/Apr/2019:22:28:00 +0000] "GET /tmptest HTTP/1.1" 404 169 "-" "curl/7.43.0"

The initial dhcp netboot request appears twice in the logs. Maybe that is showing the problem since tftp/http appear to not be blocked.

I tested against install2 from a mac in the test vlan and it was provided a tftp path on the second ack instead of repeating the default image:

server_1  | 04/10/2019 10:47:01 PM - DEBUG: -=========================================-
server_1  | 04/10/2019 10:47:01 PM - DEBUG: Got BSDP INFORM[LIST] packet: 
server_1  | 04/10/2019 10:47:01 PM - DEBUG: Determining image list for system ID Macmini7,1
server_1  | 04/10/2019 10:47:01 PM - DEBUG: Image "DSR-10136" has no restrictions, adding to list
server_1  | 04/10/2019 10:47:01 PM - DEBUG: Found default image ID 1136
server_1  | 04/10/2019 10:47:01 PM - DEBUG: -=========================================-
server_1  | 04/10/2019 10:47:01 PM - DEBUG: Return ACK[LIST] to 10.49.56.25 on 68
server_1  | 04/10/2019 10:47:01 PM - DEBUG: Default boot image ID: [129, 0, 4, 112]
server_1  | 04/10/2019 10:47:02 PM - DEBUG: -=========================================-
server_1  | 04/10/2019 10:47:02 PM - DEBUG: Got BSDP INFORM[SELECT] packet: 
server_1  | 04/10/2019 10:47:02 PM - DEBUG: Determining image list for system ID Macmini7,1
server_1  | 04/10/2019 10:47:02 PM - DEBUG: Image "DSR-10136" has no restrictions, adding to list
server_1  | 04/10/2019 10:47:02 PM - DEBUG: Found default image ID 1136
server_1  | 04/10/2019 10:47:02 PM - DEBUG: -=========================================-
server_1  | 04/10/2019 10:47:02 PM - DEBUG: Return ACK[SELECT] to 10.49.56.25 on 68
server_1  | 04/10/2019 10:47:02 PM - DEBUG: --> TFTP path: /nbi/DSR-10136.nbi/i386/booter
server_1  | -->Boot image URI: http://10.49.56.200/DSR-10136.nbi/NetInstall.dmg
tftpd_1   | Apr 10 22:47:04 da0bf25e87f0 atftpd[1.140652426065664]: socket may listen on any address, including broadcast
tftpd_1   | Apr 10 22:47:04 da0bf25e87f0 atftpd[1.140652426065664]: Serving /nbi/DSR-10136.nbi/i386/booter to 10.49.56.25:1977
[...]

(In reply to Dave House [:dhouse] from comment #14)

I tested against install2 from a mac in the test vlan and it was provided a tftp path on the second ack instead of repeating the default image:

This test machine also showed up in logs on the deploystudio host, install2.test.releng.mdc1 (went through noop; doesn't show in the deploystudio ui yet, but the log file is on disk).

the dhcp request is on port 69, and reply is on port 68. So maybe 68 is blocked to the .srv.releng vlan.

(In reply to Dave House [:dhouse] from comment #16)

the dhcp request is on port 69, and reply is on port 68. So maybe 68 is blocked to the .srv.releng vlan.

correction, tftp is port 69. the dhcp request is on port 67 I think and then reply on 68

The machines are able to netboot now against install2 deploystudio. I've installed mojave by doing that on the last two in mdc1, and now I'm getting puppet working. I tried with the not-yet-created role and with the gecko_t_osx_1014 role (both come up as failure):

Error: Evaluation Error: Error while evaluating a Function Call, Could not find class ::roles_profiles::roles::gecko_t_osx_1014 for mac-v3-signing5.srv.releng.mdc1.mozilla.com (file: /private/tmp/puppet_working.tb2t38/manifests/nodes/nodes.pp, line: 2, column: 5) on node mac-v3-signing5.srv.releng.mdc1.mozilla.com
User Story: (updated)
User Story: (updated)

Aki/Nick, I am finishing the base puppet setup for these mac-v3-signing hosts for notarization. I plan to have the 5 in mdc1 ready for you today with mojave running ronin-puppet(masterless), with popups and sleep disabled, and vnc available. Is there additional setup you'll need me to do?

I'm manually adding your ssh public keys to the ronin_puppet users (https://github.com/davehouse/ronin_puppet/blob/notarization/data/common.yaml, we do not have it pulling those from ldap yet).

You'll be able to login as your ldap ids (aki, nthomas) to mac-v3-signing{1..5}.srv.releng.mdc1.mozilla.com

I'm removing a debug user and I'll test if vnc is working (we may need firewall changes still for vnc).

Flags: needinfo?(nthomas)
Flags: needinfo?(aki)

I've set up the users, keys, and sudo access for aki and nthomas.
I've connected with vnc successfully to mac-v3-signing5 with no additional configuration/changes.
I'm next removing the debug user "relops" and then I'll reimage and verify everything is ready to set up on the other 4 machines.

Thank you! I'll poke around today. Nick's on PTO, back on the 29th.
Should we try to add configs to ronin_puppet for additional setup on these boxes?

Flags: needinfo?(nthomas)
Flags: needinfo?(aki)

(In reply to Aki Sasaki [:aki] from comment #21)

Thank you! I'll poke around today. Nick's on PTO, back on the 29th.
Should we try to add configs to ronin_puppet for additional setup on these boxes?

Would it be helpful if I copy the generic-worker talos config? Jake has that working and Joel is doing validation/testing of it. So we could then modify from that.

mac-v3-signing{1,2,4,5} are reimaged and puppetized.

#3 was not responding to ping. I asked QTS to check it and kick off a reimage manually: REQ0280341

(In reply to Dave House [:dhouse] from comment #23)

mac-v3-signing{1,2,4,5} are reimaged and puppetized.

#3 was not responding to ping. I asked QTS to check it and kick off a reimage manually: REQ0280341

I sent the request to the wrong DC. I resubmitted it to Sacramento: REQ0280562

(In reply to Dave House [:dhouse] from comment #22)

(In reply to Aki Sasaki [:aki] from comment #21)

Thank you! I'll poke around today. Nick's on PTO, back on the 29th.
Should we try to add configs to ronin_puppet for additional setup on these boxes?

Would it be helpful if I copy the generic-worker talos config? Jake has that working and Joel is doing validation/testing of it. So we could then modify from that.

Aki explained that these run scriptworker, but asked if I could set up python3 then that would be helpful. I've copied the worker setup from the talos role and commented-out all but the python3 setup and verified that it works with a new image (tested on mac-v3-signing4).

(In reply to Dave House [:dhouse] from comment #24)

(In reply to Dave House [:dhouse] from comment #23)

mac-v3-signing{1,2,4,5} are reimaged and puppetized.

#3 was not responding to ping. I asked QTS to check it and kick off a reimage manually: REQ0280341

I sent the request to the wrong DC. I resubmitted it to Sacramento: REQ0280562

QTS found the #3 mac was powered off and netboot failed. I've asked them to try swapping network cables with #2 or #4 (nearby in the rack). I continue to get no response to ping/ssh. I think it may have powered off again, and I may have some network flow/etc wrong for this one host.

(In reply to Dave House [:dhouse] from comment #26)

(In reply to Dave House [:dhouse] from comment #24)

(In reply to Dave House [:dhouse] from comment #23)

mac-v3-signing{1,2,4,5} are reimaged and puppetized.

#3 was not responding to ping. I asked QTS to check it and kick off a reimage manually: REQ0280341

I sent the request to the wrong DC. I resubmitted it to Sacramento: REQ0280562

QTS found the #3 mac was powered off and netboot failed. I've asked them to try swapping network cables with #2 or #4 (nearby in the rack). I continue to get no response to ping/ssh. I think it may have powered off again, and I may have some network flow/etc wrong for this one host.

I've continued working with QTS to recover #3. We've had mixed results reaching the recovery console and netboot (always enters new-mac setup, without sshd running and no terminal available(?)), and it appears to have power-savings still turned on as it powers itself down after 20-25 minutes idle. The technician is now trying connecting directly to the mini to see if there is a problem with the adapter on this mini (for the crash cart, since at-boot keyboard combinations are not always recognized).

(In reply to Dave House [:dhouse] from comment #27)

I've continued working with QTS to recover #3. [...]

QTS walked through new OS setup with me and set up a user with admin privileges. I was able to login and found that the sleep times were still set and SIP was turned on. So, I'm guessing that SIP was on for this machine and so the sleep changes/other did not take when applied by deploystudio. I'm not sure how SIP came on however as it must have been off to allow my bless of the machine to initially reimage it.

I asked QTS to disable SIP, and they told me over the phone that they had done that, but when I check it shows as still on. So I'm not certain what happened. I will either re-request SIP be turned off on Monday, or leave the machine for Jake to fix later next week when he visits the datacenter.

(In reply to Dave House [:dhouse] from comment #28)

I asked QTS to disable SIP, and they told me over the phone that they had done that, but when I check it shows as still on. So I'm not certain what happened. I will either re-request SIP be turned off on Monday, or leave the machine for Jake to fix later next week when he visits the datacenter.

I created a new case and worked with QTS again yesterday and we found the command was being entered incorrectly. Once that was fixed, SIP was disabled and I successfully imaged #3 with mojave and matching the other 4 mdc1 mac minis for notarization.

User Story: (updated)

With new imaging of #3, I found two problems in the setup with puppet:

  1. virtualenv install fails because pip3 is not found (dependency ordering; we may need to make it more explicit with "->"'s)
  2. network connectivity is not always available when run-puppet.sh executes. (maybe need a ping check (or does git have a config option for retry/timeout?), or to set the launchd config to depend on the network service)

(In reply to Dave House [:dhouse] from comment #30)

  1. network connectivity is not always available when run-puppet.sh executes. (maybe need a ping check (or does git have a config option for retry/timeout?), or to set the launchd config to depend on the network service)

run-puppet.sh "should" loop until success. In theory, that should handle the network not being available. Also, launchd does not have the ability to "wait for network" anymore. Apple insists that be handled by the application itself.

(In reply to Dave House [:dhouse] from comment #19)

I'm manually adding your ssh public keys to the ronin_puppet users (https://github.com/davehouse/ronin_puppet/blob/notarization/data/common.yaml, we do not have it pulling those from ldap yet).

You'll be able to login as your ldap ids (aki, nthomas) to mac-v3-signing{1..5}.srv.releng.mdc1.mozilla.com

Would it also be possible to add my key to those nodes?

(In reply to Jake Watkins [:dividehex] from comment #31)

(In reply to Dave House [:dhouse] from comment #30)

  1. network connectivity is not always available when run-puppet.sh executes. (maybe need a ping check (or does git have a config option for retry/timeout?), or to set the launchd config to depend on the network service)

run-puppet.sh "should" loop until success. In theory, that should handle the network not being available. Also, launchd does not have the ability to "wait for network" anymore. Apple insists that be handled by the application itself.

I created bug 1546448 for improving the retry/test.

(In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #32)

(In reply to Dave House [:dhouse] from comment #19)

I'm manually adding your ssh public keys to the ronin_puppet users (https://github.com/davehouse/ronin_puppet/blob/notarization/data/common.yaml, we do not have it pulling those from ldap yet).

You'll be able to login as your ldap ids (aki, nthomas) to mac-v3-signing{1..5}.srv.releng.mdc1.mozilla.com

Would it also be possible to add my key to those nodes?

Simon, I've added your user+key in https://github.com/mozilla-platform-ops/ronin_puppet/pull/23/files and applied that on mac-v3-signing4.srv.releng.mdc1. When that PR is merged, I'll un-pin the 5 mdc1 hosts from my "notarization" branch so that they are running off of master.

(In reply to Dave House [:dhouse] from comment #34)

(In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #32)

(In reply to Dave House [:dhouse] from comment #19)

I'm manually adding your ssh public keys to the ronin_puppet users (https://github.com/davehouse/ronin_puppet/blob/notarization/data/common.yaml, we do not have it pulling those from ldap yet).

You'll be able to login as your ldap ids (aki, nthomas) to mac-v3-signing{1..5}.srv.releng.mdc1.mozilla.com

Would it also be possible to add my key to those nodes?

Simon, I've added your user+key in https://github.com/mozilla-platform-ops/ronin_puppet/pull/23/files and applied that on mac-v3-signing4.srv.releng.mdc1. When that PR is merged, I'll un-pin the 5 mdc1 hosts from my "notarization" branch so that they are running off of master.

I re-ran puppet on the other 4 in mdc1 also to make sure the sfraser user is on all 5.

(In reply to Dave House [:dhouse] from comment #34)

Simon, I've added your user+key in https://github.com/mozilla-platform-ops/ronin_puppet/pull/23/files and applied that on mac-v3-signing4.srv.releng.mdc1. When that PR is merged, I'll un-pin the 5 mdc1 hosts from my "notarization" branch so that they are running off of master.

Thank you! Can confirm here that I can log in.

Nick/Simon/Aki, the base configuration for the notification macos mojave machines in mdc1 is merged. Let me know if you have any blockers or need my assistance with the setup or issues with the new puppet. We'll work on getting the mdc2 machines ready also.

(NI just as a hand-off confirmation)

User Story: (updated)
Flags: needinfo?(sfraser)
Flags: needinfo?(nthomas)

Thank you! We'll be using these shortly.

Flags: needinfo?(nthomas)
Flags: needinfo?(sfraser)

Dave, I can reach the mdc1 hosts fine but have some problems with mdc2. Should the latter be ready at this point ?

  • mac-v3-signing1.srv.releng.mdc2 appears to be down, doesn't respond to ping or port 22
[nthomas@rejh1.srv.releng.mdc2.mozilla.com ~]$ nc -vz mac-v3-signing1.srv.releng.mdc2.mozilla.com 22
nc: connect to mac-v3-signing1.srv.releng.mdc2.mozilla.com port 22 (tcp) failed: No route to host
  • get a password prompt when sshing into 2-4, my account not provisioned ?
[laptop] $ ssh mac-v3-signing2.srv.releng.mdc2.mozilla.com
nthomas@mac-v3-signing2.srv.releng.mdc2.mozilla.com's password:

(In reply to Nick Thomas [:nthomas] (UTC+12) from comment #40)

Dave, I can reach the mdc1 hosts fine but have some problems with mdc2. Should the latter be ready at this point ?

  • mac-v3-signing1.srv.releng.mdc2 appears to be down, doesn't respond to ping or port 22
[nthomas@rejh1.srv.releng.mdc2.mozilla.com ~]$ nc -vz mac-v3-signing1.srv.releng.mdc2.mozilla.com 22
nc: connect to mac-v3-signing1.srv.releng.mdc2.mozilla.com port 22 (tcp) failed: No route to host
  • get a password prompt when sshing into 2-4, my account not provisioned ?
[laptop] $ ssh mac-v3-signing2.srv.releng.mdc2.mozilla.com
nthomas@mac-v3-signing2.srv.releng.mdc2.mozilla.com's password:

Hi Nick, the mdc2 ones aren't set up yet. I'll work on them today (first to see if the imaging in mdc2 works now, or if I need to work on that).

User Story: (updated)

I've set up mac-v3-signing{2..5}.srv.releng.mdc2.mozilla.com
I've having trouble reaching the machine for #1. I'll re-purpose a different machine and rename it as #1

(In reply to Dave House [:dhouse] from comment #42)

I've set up mac-v3-signing{2..5}.srv.releng.mdc2.mozilla.com
I've having trouble reaching the machine for #1. I'll re-purpose a different machine and rename it as #1

I am not able to login after the reimage+puppet on the 4, and so I've asked QTS to check all 5 and bless+reboot them again.

(In reply to Dave House [:dhouse] from comment #43)

(In reply to Dave House [:dhouse] from comment #42)

I've set up mac-v3-signing{2..5}.srv.releng.mdc2.mozilla.com
I've having trouble reaching the machine for #1. I'll re-purpose a different machine and rename it as #1

I am not able to login after the reimage+puppet on the 4, and so I've asked QTS to check all 5 and bless+reboot them again.

In working with QTS yesterday, I have still not gotten the mdc2 notarization machines set up yet -- SIP and recovery console issues where I am forming better instructions with less failure risk (remote hands has had trouble reaching the recovery console, and more instruction/clarity issues for how to disable sip and bless/reboot).

I worked with the QTS tech lead on fixing the 5 in mdc2: 1. to turn off SIP. 2. to prevent repeated system-setup. However, the machines do not all correctly load the apple recovery console (none load from the local recovery partition; half fail to load from apple's servers; one now has a blank screen).

I'll resume troubleshooting with their tech lead on Tuesday morning.

Depends on: 1555100

I created bug 1555100: We need to connect these mac minis to Apple's internet recovery so that we can turn off SIP.

Blocks: 1471004

QTS closed the bug to disable SIP and reimage the minis in MDC2. I've opened a new bug with QTS (RITM0311227) to have them follow up these minis.

On another note, I noticed the host names on the mac-v3-signing minis conflict between mdc1 and mdc2. We will need to change these.

Depends on: 1561023
Depends on: 1561098
Depends on: 1561117
User Story: (updated)
Depends on: 1570187
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: