Closed
Bug 1472404
Opened 6 years ago
Closed 6 years ago
puppet version UNKNOWN for mac-depsigning2.srv.releng.mdc1.mozilla.com
Categories
(Infrastructure & Operations :: RelOps: Puppet, task)
Infrastructure & Operations
RelOps: Puppet
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: apop, Assigned: dhouse)
References
Details
Attachments
(2 files)
while monitoring the #platform-ops-alerts, the following alert appeared :
Sat 02:12:18 UTC [8034] [Unknown] mac-depsigning2.srv.releng.mdc1.mozilla.com:Puppet freshness is UNKNOWN: Sorry, puppet version UNKNOWN is unsupported by /usr/local/libexec/check_puppet_freshness
Can you please help with the issue or point me to someone who can help ?
Reporter | ||
Comment 1•6 years ago
|
||
Reporter | ||
Comment 2•6 years ago
|
||
I've tried to check for the puppet version, but it gave me "paths". Please check the screenshot that I've attached.
Comment 3•6 years ago
|
||
Infrastructure & Operations::Puppet is for IT's puppet infra; puppet issues for RelEng/RelOps' systems should be filed in Infra&Ops::RelOps::Puppet
Assignee: infra → relops
Component: Infrastructure: Puppet → RelOps: Puppet
QA Contact: cshields → mcornmesser
Comment 4•6 years ago
|
||
This looks like https://tickets.puppetlabs.com/browse/FACT-724 ?
Comment 5•6 years ago
|
||
Puppet is 3.7.0 . It appears to work other than this warning.
Comment 6•6 years ago
|
||
We're running facter 2.2.0. It appears upgrading to 2.3.0 may fix this. However, other systems appear to be using facter 2.2.0; not sure why mac-depsigning2 is the only system I've seen with this behavior.
Comment 7•6 years ago
|
||
Sorry for all the comments; I think this is the last one.
mac-depsigning2 was failing to install mercurial; `export SDKROOT=macosx10.10` before puppetizing fixed that. It didn't fix these warnings. We may want to try reimaging this box?
https://bugzilla.mozilla.org/show_bug.cgi?id=1472860#c11
+1 I'll reimage this machine to make sure it is in a good state.
Assignee: relops → dhouse
Flags: needinfo?(dhouse)
I logged in and issued the bless and reboot on mac-depsigning2. I'll make sure the deploystudio process completes the rebuild and a new puppetize completes:
```
[dhouse@rejh2.srv.releng.mdc1.mozilla.com ~]$ ssh mac-depsigning2.srv.releng.mdc1.mozilla.com
This host is set to follow security level "maximum"
Unauthorized access prohibited
ONLY START SIGNING SERVERS AS cltsign
This signing server hosts the following instances:
* port 9110 in /builds/signing/dep-signing-server
[dhouse@mac-depsigning2.srv.releng.mdc1.mozilla.com ~]$ uptime
19:31 up 311 days, 2:52, 2 users, load averages: 1.35 1.40 1.31
[dhouse@mac-depsigning2.srv.releng.mdc1.mozilla.com ~]$ sudo su -
[root@mac-depsigning2.srv.releng.mdc1.mozilla.com ~]# /usr/sbin/bless --netboot --server bsdp://10.49.56.16; reboot
[root@mac-depsigning2.srv.releng.mdc1.mozilla.com ~]# Connection to mac-depsigning2.srv.releng.mdc1.mozilla.com closed by remote host.
Connection to mac-depsigning2.srv.releng.mdc1.mozilla.com closed.
```
Flags: needinfo?(dhouse)
Assignee | ||
Comment 10•6 years ago
|
||
Deploystudio rebuild successful:
```
deploystudio@install.test.releng.mdc1.mozilla.com
Attachments1:46 PM (4 minutes ago)
to deploystudio-alerts
The workflow 'Restore r5 signing server' was launched on the computer C07HT05CDKDJ (name: mac-depsigning2.srv.releng.mdc1.mozilla.com, ip: 10.49.48.18, mac: 40:6c:8f:0d:2c:27) with a SUCCESSFUL termination status. This mail was generated automatically by DeployStudio Server. -- The DeployStudio Team.
```
Assignee | ||
Comment 11•6 years ago
|
||
I didn't see a puppet cert email. So I logged into the machine and it is still named with the image's hostname, and appears to be re-doing the deploystudio restore.
I'll check it again in 30min or so to see if it re-ran the deploystudio reimage.
Assignee | ||
Comment 12•6 years ago
|
||
Seeing timeouts/failures. I'll leave it for now to see if it handles the timeouts and recovers, or if I need to reboot it, or redo the full reimage.
```
Jul 26 14:41:11 t-yosemite-r7-0001.test.releng.scl3.mozilla.com apsd[66]: Got connection error Error Domain=NSURLErrorDomain Code=-1001 "The request timed out." UserInfo=0x7f953b232e50 {NSUnderlyingError=0x7f9538d15d20 "The request timed out.", NSErrorFailingURLStringKey=https://albert.apple.com/deviceservices/deviceActivation?device=MacOS, NSErrorFailingURLKey=https://albert.apple.com/deviceservices/deviceActivation?device=MacOS, NSLocalizedDescription=The request timed out.}
Jul 26 14:41:11 t-yosemite-r7-0001.test.releng.scl3.mozilla.com apsd[66]: <APSCertificateManager: 0x7f953b210730>: Failed to get client cert on attempt 12, will retry in 900 seconds
Jul 26 14:41:30 t-yosemite-r7-0001.test.releng.scl3.mozilla.com nsurlsessiond[181]: tcp_connection_attempt_timer_fired 8 connection took too long to connect, cleaning up current attempt
Jul 26 14:41:30 t-yosemite-r7-0001.test.releng.scl3.mozilla.com nsurlsessiond[181]: tcp_connection_attempt_timer_fired 10 connection took too long to connect, cleaning up current attempt
Jul 26 14:41:30 t-yosemite-r7-0001.test.releng.scl3.mozilla.com nsurlsessiond[181]: tcp_connection_attempt_timer_fired 9 connection took too long to connect, cleaning up current attempt
```
Assignee | ||
Comment 13•6 years ago
|
||
attached log showing reimage (deploystudio) and timeout/failures afterward
Assignee | ||
Comment 14•6 years ago
|
||
I checked with Jake. He said deploystudio changes the hostname, so it did not complete if the name did not change.
Comment 15•6 years ago
|
||
(In reply to Dave House [:dhouse] from comment #12)
> Seeing timeouts/failures. I'll leave it for now to see if it handles the
> timeouts and recovers, or if I need to reboot it, or redo the full reimage.
> ```
> Jul 26 14:41:11 t-yosemite-r7-0001.test.releng.scl3.mozilla.com apsd[66]:
> Got connection error Error Domain=NSURLErrorDomain Code=-1001 "The request
> timed out." UserInfo=0x7f953b232e50 {NSUnderlyingError=0x7f9538d15d20 "The
> request timed out.",
> NSErrorFailingURLStringKey=https://albert.apple.com/deviceservices/
> deviceActivation?device=MacOS,
> NSErrorFailingURLKey=https://albert.apple.com/deviceservices/
> deviceActivation?device=MacOS, NSLocalizedDescription=The request timed out.}
> Jul 26 14:41:11 t-yosemite-r7-0001.test.releng.scl3.mozilla.com apsd[66]:
> <APSCertificateManager: 0x7f953b210730>: Failed to get client cert on
> attempt 12, will retry in 900 seconds
> Jul 26 14:41:30 t-yosemite-r7-0001.test.releng.scl3.mozilla.com
> nsurlsessiond[181]: tcp_connection_attempt_timer_fired 8 connection took too
> long to connect, cleaning up current attempt
> Jul 26 14:41:30 t-yosemite-r7-0001.test.releng.scl3.mozilla.com
> nsurlsessiond[181]: tcp_connection_attempt_timer_fired 10 connection took
> too long to connect, cleaning up current attempt
> Jul 26 14:41:30 t-yosemite-r7-0001.test.releng.scl3.mozilla.com
> nsurlsessiond[181]: tcp_connection_attempt_timer_fired 9 connection took too
> long to connect, cleaning up current attempt
> ```
I'm sure these errors are unrelated and are caused because the outgoing blocks many connections going back to appple. MacOs does a lot of phoning home that we don't want.
Comment 16•6 years ago
|
||
I looked at the workflow associated with the r5 depsigning minis and they were missing several steps which was leaving them reimaged but without things like hostname changes, setting up puppet to run, etc. I ended up copying the yosemite-r7 workflow (named to yosemite-r5) and added an extra step for partition/formatting the second drive. r7's have a singe drive while r5's have 2 disks.
After that, I hit another issue where the post install scripts failed because the first disk was not mounted. From what I can tell, this was caused by the second disk partioning task force unmount all disk mount points and not restoring them. I simply moved both partitioning tasks to the front of the workflow to exec before the image restore process. This seemed to fix it.
We should reimage the other 2 osx depsigners also. And I'm sure this is broken in mdc2 also so I'll need to attend to that.
Comment 17•6 years ago
|
||
This host was re-imaged after fixing the deploystudio workflows so this should be fixed.
https://bugzilla.mozilla.org/show_bug.cgi?id=1480512#c3
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•