Closed Bug 1472404 Opened 6 years ago Closed 6 years ago

puppet version UNKNOWN for mac-depsigning2.srv.releng.mdc1.mozilla.com

Categories

(Infrastructure & Operations :: RelOps: Puppet, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: apop, Assigned: dhouse)

References

Details

Attachments

(2 files)

while monitoring the #platform-ops-alerts, the following alert appeared : Sat 02:12:18 UTC [8034] [Unknown] mac-depsigning2.srv.releng.mdc1.mozilla.com:Puppet freshness is UNKNOWN: Sorry, puppet version UNKNOWN is unsupported by /usr/local/libexec/check_puppet_freshness Can you please help with the issue or point me to someone who can help ?
Attached image version.png
I've tried to check for the puppet version, but it gave me "paths". Please check the screenshot that I've attached.
Infrastructure & Operations::Puppet is for IT's puppet infra; puppet issues for RelEng/RelOps' systems should be filed in Infra&Ops::RelOps::Puppet
Assignee: infra → relops
Component: Infrastructure: Puppet → RelOps: Puppet
QA Contact: cshields → mcornmesser
Blocks: 1472860
Puppet is 3.7.0 . It appears to work other than this warning.
We're running facter 2.2.0. It appears upgrading to 2.3.0 may fix this. However, other systems appear to be using facter 2.2.0; not sure why mac-depsigning2 is the only system I've seen with this behavior.
Sorry for all the comments; I think this is the last one. mac-depsigning2 was failing to install mercurial; `export SDKROOT=macosx10.10` before puppetizing fixed that. It didn't fix these warnings. We may want to try reimaging this box? https://bugzilla.mozilla.org/show_bug.cgi?id=1472860#c11
+1 I'll reimage this machine to make sure it is in a good state.
Assignee: relops → dhouse
Flags: needinfo?(dhouse)
I logged in and issued the bless and reboot on mac-depsigning2. I'll make sure the deploystudio process completes the rebuild and a new puppetize completes: ``` [dhouse@rejh2.srv.releng.mdc1.mozilla.com ~]$ ssh mac-depsigning2.srv.releng.mdc1.mozilla.com This host is set to follow security level "maximum" Unauthorized access prohibited ONLY START SIGNING SERVERS AS cltsign This signing server hosts the following instances: * port 9110 in /builds/signing/dep-signing-server [dhouse@mac-depsigning2.srv.releng.mdc1.mozilla.com ~]$ uptime 19:31 up 311 days, 2:52, 2 users, load averages: 1.35 1.40 1.31 [dhouse@mac-depsigning2.srv.releng.mdc1.mozilla.com ~]$ sudo su - [root@mac-depsigning2.srv.releng.mdc1.mozilla.com ~]# /usr/sbin/bless --netboot --server bsdp://10.49.56.16; reboot [root@mac-depsigning2.srv.releng.mdc1.mozilla.com ~]# Connection to mac-depsigning2.srv.releng.mdc1.mozilla.com closed by remote host. Connection to mac-depsigning2.srv.releng.mdc1.mozilla.com closed. ```
Flags: needinfo?(dhouse)
Deploystudio rebuild successful: ``` deploystudio@install.test.releng.mdc1.mozilla.com Attachments1:46 PM (4 minutes ago) to deploystudio-alerts The workflow 'Restore r5 signing server' was launched on the computer C07HT05CDKDJ (name: mac-depsigning2.srv.releng.mdc1.mozilla.com, ip: 10.49.48.18, mac: 40:6c:8f:0d:2c:27) with a SUCCESSFUL termination status. This mail was generated automatically by DeployStudio Server. -- The DeployStudio Team. ```
I didn't see a puppet cert email. So I logged into the machine and it is still named with the image's hostname, and appears to be re-doing the deploystudio restore. I'll check it again in 30min or so to see if it re-ran the deploystudio reimage.
Seeing timeouts/failures. I'll leave it for now to see if it handles the timeouts and recovers, or if I need to reboot it, or redo the full reimage. ``` Jul 26 14:41:11 t-yosemite-r7-0001.test.releng.scl3.mozilla.com apsd[66]: Got connection error Error Domain=NSURLErrorDomain Code=-1001 "The request timed out." UserInfo=0x7f953b232e50 {NSUnderlyingError=0x7f9538d15d20 "The request timed out.", NSErrorFailingURLStringKey=https://albert.apple.com/deviceservices/deviceActivation?device=MacOS, NSErrorFailingURLKey=https://albert.apple.com/deviceservices/deviceActivation?device=MacOS, NSLocalizedDescription=The request timed out.} Jul 26 14:41:11 t-yosemite-r7-0001.test.releng.scl3.mozilla.com apsd[66]: <APSCertificateManager: 0x7f953b210730>: Failed to get client cert on attempt 12, will retry in 900 seconds Jul 26 14:41:30 t-yosemite-r7-0001.test.releng.scl3.mozilla.com nsurlsessiond[181]: tcp_connection_attempt_timer_fired 8 connection took too long to connect, cleaning up current attempt Jul 26 14:41:30 t-yosemite-r7-0001.test.releng.scl3.mozilla.com nsurlsessiond[181]: tcp_connection_attempt_timer_fired 10 connection took too long to connect, cleaning up current attempt Jul 26 14:41:30 t-yosemite-r7-0001.test.releng.scl3.mozilla.com nsurlsessiond[181]: tcp_connection_attempt_timer_fired 9 connection took too long to connect, cleaning up current attempt ```
attached log showing reimage (deploystudio) and timeout/failures afterward
I checked with Jake. He said deploystudio changes the hostname, so it did not complete if the name did not change.
(In reply to Dave House [:dhouse] from comment #12) > Seeing timeouts/failures. I'll leave it for now to see if it handles the > timeouts and recovers, or if I need to reboot it, or redo the full reimage. > ``` > Jul 26 14:41:11 t-yosemite-r7-0001.test.releng.scl3.mozilla.com apsd[66]: > Got connection error Error Domain=NSURLErrorDomain Code=-1001 "The request > timed out." UserInfo=0x7f953b232e50 {NSUnderlyingError=0x7f9538d15d20 "The > request timed out.", > NSErrorFailingURLStringKey=https://albert.apple.com/deviceservices/ > deviceActivation?device=MacOS, > NSErrorFailingURLKey=https://albert.apple.com/deviceservices/ > deviceActivation?device=MacOS, NSLocalizedDescription=The request timed out.} > Jul 26 14:41:11 t-yosemite-r7-0001.test.releng.scl3.mozilla.com apsd[66]: > <APSCertificateManager: 0x7f953b210730>: Failed to get client cert on > attempt 12, will retry in 900 seconds > Jul 26 14:41:30 t-yosemite-r7-0001.test.releng.scl3.mozilla.com > nsurlsessiond[181]: tcp_connection_attempt_timer_fired 8 connection took too > long to connect, cleaning up current attempt > Jul 26 14:41:30 t-yosemite-r7-0001.test.releng.scl3.mozilla.com > nsurlsessiond[181]: tcp_connection_attempt_timer_fired 10 connection took > too long to connect, cleaning up current attempt > Jul 26 14:41:30 t-yosemite-r7-0001.test.releng.scl3.mozilla.com > nsurlsessiond[181]: tcp_connection_attempt_timer_fired 9 connection took too > long to connect, cleaning up current attempt > ``` I'm sure these errors are unrelated and are caused because the outgoing blocks many connections going back to appple. MacOs does a lot of phoning home that we don't want.
I looked at the workflow associated with the r5 depsigning minis and they were missing several steps which was leaving them reimaged but without things like hostname changes, setting up puppet to run, etc. I ended up copying the yosemite-r7 workflow (named to yosemite-r5) and added an extra step for partition/formatting the second drive. r7's have a singe drive while r5's have 2 disks. After that, I hit another issue where the post install scripts failed because the first disk was not mounted. From what I can tell, this was caused by the second disk partioning task force unmount all disk mount points and not restoring them. I simply moved both partitioning tasks to the front of the workflow to exec before the image restore process. This seemed to fix it. We should reimage the other 2 osx depsigners also. And I'm sure this is broken in mdc2 also so I'll need to attend to that.
Blocks: 1374787
This host was re-imaged after fixing the deploystudio workflows so this should be fixed. https://bugzilla.mozilla.org/show_bug.cgi?id=1480512#c3
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: