Closed Bug 1463970 Opened 7 years ago Closed 6 years ago

ciduty can not connect into any moonshot machine via SSH

Categories

(Infrastructure & Operations :: RelOps: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: riman, Assigned: dhouse)

References

Details

Attachments

(2 files)

Trying to loan a t-linux64-ms-xxx I have noticed that I can not connect into machine via SSH. I have tried with other moonshot workers from mdc1 and mdc2 but I encountered the same issue. Should CIduty be in authorized_keys on moonshot machines?
Blocks: 1457510
I checked over many of the mac workers and other moonshot linux workers. Some have the buildduty keys in the root authorized_keys, and some do not. For example: t-linux64-ms-001,002 do not have the keys t-linux64-ms-003..005 have the keys
I ran puppet manually on t-linux64-ms-001, and it added the missing user keys into the root authorized_keys. So I'm guessing there was some network/other problem that caused puppet to not find these keys and remove them earlier. That would explain why some of the hosts have the keys, and others do not.
Assignee: relops → dhouse
Strangely, the authorized_keys files missing the ciduty keys are dated back to January 25 (between 9 and 14 pacific time). So, why were they not updated since then? Puppet was running, right? Where the ciduty keys exist, I see authorized_keys files with dates from Feb 6 and newer. However, the files vary in size. Why do they not have similar dates and sizes?
(In reply to Dave House [:dhouse] from comment #3) > Strangely, the authorized_keys files missing the ciduty keys are dated back > to January 25 (between 9 and 14 pacific time). So, why were they not updated > since then? Puppet was running, right? The workers have rebooted. The oldest I found of the linux moonshots is 3 days since last boot. The /var/lib/puppet/state/last_run_summary.yaml timestamps are from Jan 25 for these. So that looks like puppet has not run at the reboots. > Where the ciduty keys exist, I see authorized_keys files with dates from Feb > 6 and newer. > However, the files vary in size. Why do they not have similar dates and > sizes? If puppet is not running, or not updating these files, then that is why they are different sizes and dates.
looking on some t-linux64-ms-* hosts, looks like puppet service was created, but was not enabled and not started: [root@t-linux64-ms-003 ~]# systemctl status puppet ● puppet.service - Puppet agent Loaded: loaded (/lib/systemd/system/puppet.service; disabled; vendor preset: enabled) Active: inactive (dead) The file service.puppet exist: [root@t-linux64-ms-003 ~]# /lib/systemd/system/puppet.service -bash: /lib/systemd/system/puppet.service: Permission denied [root@t-linux64-ms-003 ~]# cat /lib/systemd/system/puppet.service [Unit] Description=Puppet agent [Service] Type=forking EnvironmentFile=-/etc/default/puppet PIDFile=/run/puppet/agent.pid ExecStart=/usr/bin/puppet agent $DAEMON_OPTS [Install] WantedBy=multi-user.target
Adding an entry to puppet::atboot for ubuntu 16.04. On Ubuntu 16.04, puppet is running by systemd service, not by upstart. Created puppet service for ubuntu 16.04 ensuring the service is enabled and running
Attachment #8980222 - Flags: review?(jwatkins)
Attachment #8980222 - Flags: review?(dhouse)
for windows moonshots, the pub key can be added here: https://github.com/mozilla-releng/OpenCloudConfig/blob/master/userdata/Configuration/ssh/authorized_keys either submit a pr or let me know what pub key(s) to add. cheers!
(In reply to Rob Thijssen (:grenade UTC+2) from comment #7) > for windows moonshots, the pub key can be added here: > https://github.com/mozilla-releng/OpenCloudConfig/blob/master/userdata/ > Configuration/ssh/authorized_keys > either submit a pr or let me know what pub key(s) to add. cheers! Thank you! :riman, can you work with your team to add your keys for windows?
Flags: needinfo?(riman)
:dhouse, we will work on it. Thx!
Comment on attachment 8980222 [details] [diff] [review] Bug_1463970_adding_puppet_service.patch Review of attachment 8980222 [details] [diff] [review]: ----------------------------------------------------------------- The point of atboot is to make sure puppet only runs "at boot" and runs in a loop until completing a full puppet run successfully. This patch enables the puppet systemd service to run puppet like a daemon and that isn't what we want. That also is why the service is disabled in systemd by default. Instead, you will need to change the puppet service (/lib/systemd/system/puppet.service) to source puppet_atboot_common and call the run_until_success function. You also need to config the service to treat it as a run-once. Also don't forget to exec the systemd reload after changing the service file.
Attachment #8980222 - Flags: review?(jwatkins) → review-
Comment on attachment 8980222 [details] [diff] [review] Bug_1463970_adding_puppet_service.patch I tried running this on a moonshot, and it gave me an error about not finding the service in upstart. I think it may have problems removing the old config. Maybe we'll need to do a manual repair across the machines to prepare them for this patch? When you've updated the patch, could you test it on one of the existing production t-linux64-ms machines? (maybe we just take one out of active service for testing this?)
Attachment #8980222 - Flags: review?(dhouse) → review-
(In reply to Dave House [:dhouse] from comment #11) > Comment on attachment 8980222 [details] [diff] [review] > Bug_1463970_adding_puppet_service.patch > > I tried running this on a moonshot, and it gave me an error about not > finding the service in upstart. I think it may have problems removing the > old config. Maybe we'll need to do a manual repair across the machines to > prepare them for this patch? > > When you've updated the patch, could you test it on one of the existing > production t-linux64-ms machines? (maybe we just take one out of active > service for testing this?) This is the error I saw from puppet: ``` Thu May 24 10:10:43 -0700 2018 /Stage[main]/Puppet::Atboot/Service[puppet] (err): Could not evaluate: Execution of '/sbin/status puppet' returned 1: status: Unable to connect to Upstart: Failed to connect to socket /com/ubuntu/upstart: Connection refused ```
I've run puppet on all the t-linux64-ms machines. So the CIDuty ssh keys, current today, are included in the root authorized_keys on these machines. One machine, t-linux64-ms-488, need to be reimaged. It has an invalid cert. I tried running the puppetize.sh again since it looks like it may not have ever completed, but that fails (some error about an apt-get update failing came through the puppet email).
I've done the reimage on t-linux-ms-488. Seems like it appears in TC and it's taking tasks. We still cannot SSH into it.
Flags: needinfo?(dhouse)
Dave, rmutter has re-imaged t-linux64-ms-488 and now is running jobs successfully. However, we cannot ssh in to this server. I believe ssh port 22 is closed because I try to telnet IP 22 and just Trying IP ... then time out.
Looks like all t-linux-ms have the ssh closed, noone works, except the ones with issues on Bug 1464064.
Solved, updated local ssh configs. Thanks to dcrisan ;). SSH connection works, CI team notified to update configs.
Added to .ssh/config: Host *.releng.mdc2.mozilla.com !rejh?.srv.releng.mdc2.mozilla.com !.private.releng.????.mozilla.com ProxyCommand ssh -W %h:%p rejh1.srv.releng.mdc2.mozilla.com
(In reply to Roland Mutter Michael (:rmutter) from comment #14) > I've done the reimage on t-linux-ms-488. Seems like it appears in TC and > it's taking tasks. We still cannot SSH into it. Thank you for reimaging it! I see :arny's note that :dragrom helped with the ssh config.
Flags: needinfo?(dhouse)
Flags: needinfo?(riman)
Reading all of the above , I've tried to connect to t-linux64-ms-049 with my ldap . I've entered the duo key and seems like it requires my password for the user created on the machine . Works when I try to connect as root.
Flags: needinfo?(dhouse)
Thanks for helping ciduty get access to moonshots. Sounds like we can connect to linux se machines now? As to Windows, @ciduty please append your ssh pub key to the file Rob linked to and then submit that as a PR: https://github.com/mozilla-releng/OpenCloudConfig/blob/master/userdata/Configuration/ssh/authorized_keys Even better, if one of you could gather all the public ssh keys from the team and submit one PR, this would save Rob some churn.
Flags: needinfo?(ciduty)
(In reply to Roland Mutter Michael (:rmutter) from comment #20) > Reading all of the above , I've tried to connect to t-linux64-ms-049 with my > ldap . I've entered the duo key and seems like it requires my password for > the user created on the machine . Works when I try to connect as root. I'm sorry that is the expected use. Your key was added by puppet to the .ssh/authorized_keys for the root user. So we don't have user accounts on the workers, but our ssh keys are allowed to log us in as root.
Flags: needinfo?(dhouse)
Depends on: 1465309
Dragos, I made a separate bug, 1465309, for the puppet at-boot setup since this bug was focused on the ssh keys setup for ciduty.
Since I was on PTO I was the blocker for this, I have made a pull request which contains all the CiDuty's SSH Public Keys. Clearing NI? on ciduty@m.c as our part is now done. Waiting for the PR to be accepted and merged into master.
Flags: needinfo?(ciduty)
@grenade - hi, what's the status of https://github.com/mozilla-releng/OpenCloudConfig/pull/149 ? Need anything else from CIDuty?
For the moonshots, I have manually run puppet today since we are waiting on bug 1465309 for the atboot execution of puppet. Once that bug's change is applied, we will need to manually run puppet on all of the moonshot linux instances to apply the change (set puppet to run at boot).
The puppet atboot change was applied and all of the linux moonshots are getting updated ssh keys now from puppet.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
I've reopened the bug for tracking the access request until this is fulfilled For Security reasons, Rob , asked us to submit a PR individually, that contains our public key. this should be done on this path : userdata/Configuration/ssh/authorized_keys on this repository : https://github.com/mozilla-releng/OpenCloudConfig
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached file GitHub Pull Request
i think the kts ssh server may be interfering with the openssh server on hw win 10 because of contention over port 22.
Attachment #9006520 - Flags: review?(mcornmesser)
Attachment #9006520 - Flags: review?(mcornmesser) → review+
Closing the bug since all of the team has access to the moonshot machines via SSH and the kts ssh server modification to disable it has been merged 26 days ago.
Status: REOPENED → RESOLVED
Closed: 7 years ago6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: