Closed
Bug 1393524
Opened 7 years ago
Closed 7 years ago
Change mac-mini slaves to not always run puppet at boot.
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: dhouse, Assigned: dhouse)
References
Details
Attachments
(5 files, 9 obsolete files)
2.58 KB,
text/plain
|
Details | |
6.63 KB,
patch
|
dividehex
:
review+
dhouse
:
checked-in+
|
Details | Diff | Splinter Review |
648 bytes,
patch
|
dividehex
:
review+
dhouse
:
checked-in+
|
Details | Diff | Splinter Review |
772 bytes,
patch
|
dividehex
:
review+
dhouse
:
checked-in+
|
Details | Diff | Splinter Review |
3.57 KB,
patch
|
dividehex
:
review+
dragrom
:
review+
dhouse
:
checked-in+
|
Details | Diff | Splinter Review |
Re: "firefox-ci > rebooting OS X 10.10 testers between jobs" (https://groups.google.com/a/mozilla.com/forum/?utm_medium=email&utm_source=footer#!topic/firefox-ci/mMEt8YmdsuA)
On mac test/try slaves, we will stop running puppet at every boot to reduce the time spent for each reboot to get a slave back to running the next job.
details:
Puppet runs at boot. Mac-mini slaves are rebooting after each test. Puppet can take a few minutes to run. The reboot+puppetrun time slows down our testing.
Because there may be residue left after try/tests and it would require unknown work to clean that up.
Joel suggested "Personally I think rebooting every few jobs would be reasonable (maybe every 5th?)"
Dustin suggested "The run-puppet script could add a timestamp on
success and skip any runs within N seconds of that timestamp (say, 30
minutes)."
Amy raised the concern that OSX screen resolution is set, or fixed, by puppet. So if we do not run puppet at every boot, there are some settings (only expecting screen resolution for now) that need to be set despite not running puppet (to put put into the at-boot script).
https://hg.mozilla.org/build/puppet/file/tip/modules/screenresolution/manifests/init.pp
Through the gui module,
https://hg.mozilla.org/build/puppet/file/tip/modules/gui/manifests/init.pp#l23
The screenresolution module is applied to test slave in:
https://hg.mozilla.org/build/puppet/file/tip/modules/toplevel/manifests/slave/releng/test/headless.pp#l7
https://hg.mozilla.org/build/puppet/file/tip/modules/toplevel/manifests/slave/releng/test/gpu.pp#l7
# Noting:https://hg.mozilla.org/build/puppet/file/tip/modules/gui/manifests/init.pp#l24
# $on_gpu is irrelevant on Darwin - everything's onscreen, and
gui also modifies appearance:
https://hg.mozilla.org/build/puppet/file/tip/modules/gui/manifests/appearance.pp#l9
AppleShowScrollBars
set-background-image
restart-Dock
So I'll want to test if these also need set (do I dare to search for other settings).
Puppet is set up to run at boot for macs in:
https://hg.mozilla.org/build/puppet/file/tip/modules/puppet/manifests/atboot.pp#l100
Which places a script built from:
https://hg.mozilla.org/build/puppet/file/tip/modules/puppet/templates/puppet-darwin-run-puppet.sh.erb
which includes the generic puppet vars and run functions from:
https://hg.mozilla.org/build/puppet/file/tip/modules/puppet/templates/puppet-atboot-common.erb
So we can put the logic and always-settings into the darwin-only template shell script to only affect darwin boots.
To restrict the logic to test slaves, the atboot.pp can include a stanza for tester class that tells the script what to do (flag, or other template or settings).
For the screenresolution setting, to start with, we can clip from the module and assume the binary is installed:
modules/screenresolution/manifests/init.pp
13 10.7: {¬
14 $resolution = "${width}x${height}x${depth}"¬
15 }¬
16 10.8,10.9: {¬
17 $resolution = "${width}x${height}x${depth}@${refresh}"¬
18 }¬
26 command => "/usr/local/bin/screenresolution set ${resolution}",¬
27 unless => "/usr/local/bin/screenresolution get 2>&1 | /usr/bin/grep 'Display 0: ${resolution}'",
I haven't tested this yet.
THis is a first pass to apply a time check and screenresolution for puppet:atboot on mac test slaves. When I have time to figure out the puppet details, the resolution seting would be better done by using the gui+screenresolution modules to create the resolution command instead of the copy-paste into atboot.
It may be better to use a boot count, but the elapsed time seems good for puppet changes to be applied at next boot greater than 30min, and to have some consistency in the time added by puppet runs (only add the puppet run time every 30 minutes).
Using a boot count could be better if we are accumulating changes at boot that puppet would have fixed, but I don't think we have anything like that.
Attachment #8900883 -
Flags: review?(jwatkins)
I need to do something more to get puppet to re-run the puppet::atboot module, or I'm not matching the node definition:
Testing on t-yosemite-r7-471.test.releng.mdc1.mozilla.com (:arr is on 472, :dragos is on 394):
```
[root@t-yosemite-r7-471.test.releng.mdc1.mozilla.com ~]# puppet agent --test --noop --environment=dhouse --server=releng-puppet2.srv.releng.scl3.mozilla.com
[...]
[root@t-yosemite-r7-471.test.releng.mdc1.mozilla.com ~]# puppet agent --test --environment=dhouse --server=releng-puppet2.srv.releng.scl3.mozilla.com
[...]
[root@t-yosemite-r7-471.test.releng.mdc1.mozilla.com ~]# ls -la /usr/local/bin/run-puppet.sh
-rwxr-xr-x 1 root wheel 4778 May 17 12:44 /usr/local/bin/run-puppet.sh
```
My node regex was still from default, with only scl3 for t-.test
-node /t.*-\d+\.test\.releng\.scl3\.mozilla\.com/ {
+node /t.*-\d+\.test\.releng\.(scl3|mdc1)\.mozilla\.com/ {
Still needs some cleanup, but it works (tested t-yosemite-r7-471.mdc1) so I'd like to apply it Friday morning if it looks good enough for now.
Attachment #8900883 -
Attachment is obsolete: true
Attachment #8900883 -
Flags: review?(jwatkins)
Attachment #8900939 -
Flags: review?(jwatkins)
I accidentally left a date format arg in the stat command in the script template.
Attachment #8900939 -
Attachment is obsolete: true
Attachment #8900939 -
Flags: review?(jwatkins)
Attachment #8900948 -
Flags: review?(jwatkins)
Comment 8•7 years ago
|
||
Comment on attachment 8900948 [details] [diff] [review]
tested config for 30min before puppet on reboot for macs
We're looking to base the decision on whether or not puppet run on the number of taskcluster jobs that have been executed (number of time the machine has rebooted), not absolute wall clock time.
Attachment #8900948 -
Flags: review?(jwatkins) → review-
Comment 9•7 years ago
|
||
Actually, thinking about this some more, we should probably check for either condition. We want to look for 5 reboots OR time > 1h (or 30m, whatever) to determine whether we should run puppet.
We want to make sure that we aren't going too many jobs between puppet runs (something might be busted), but we also want to make sure that a machine doesn't sit idle for days at a time and not run puppet on the next reboot.
My other concern is... what if we push out a busted puppet patch? This is going to increase the time between pushing a fix and having a machine pick it up. Can we build in a failsafe that will tell it to run puppet on the next reboot?
Assignee | ||
Comment 10•7 years ago
|
||
Attachment #8900948 -
Attachment is obsolete: true
Attachment #8901249 -
Flags: review?(jwatkins)
Attachment #8901249 -
Flags: review?(arich)
Assignee | ||
Comment 11•7 years ago
|
||
I cleaned up the script. after review, I'll plan to apply it to a subset of mdc1 testers early next week.
Attachment #8901385 -
Flags: review?(jwatkins)
Attachment #8901249 -
Attachment is obsolete: true
Attachment #8901249 -
Flags: review?(jwatkins)
Attachment #8901249 -
Flags: review?(arich)
Comment 12•7 years ago
|
||
Comment on attachment 8901385 [details] [diff] [review]
macs run_puppet if N runs skipped, if 30min since last run, or if hg hash changed. cleaned for testing next week
Review of attachment 8901385 [details] [diff] [review]:
-----------------------------------------------------------------
I really like the three-pronged approach here especially comparing the current repo hash. This is definitely a good comprehensive solution to this problem and should work well. Although do test it, I tend to miss things when it comes to shell scripts. ;-)
The one BIG issue I see is this breaks the reboot semaphore function. If a reboot flag is set anytime during a puppet run, we want to make sure the host is rebooted and puppet is run again immediately at boot. This should cycle as many times as needed until puppet completes successfully and no reboot semaphore exists after. I think you can simply insert a check for the semaphore to ensure the limiting logic is overridden and puppet is run anyway.
Also, let's move the 2 node scope vars to moco-config. Set them undef in config::base and set them to real values in moco-config (using a $::fqdn selector if needed)
::: manifests/moco-nodes.pp
@@ +17,5 @@
> $slave_trustlevel = 'try'
> +
> + # Easy faster Darwin boot for bug 1393524
> + $puppet_run_if_more_than_seconds = 7200
> + $puppet_run_if_more_than_n_reboots = 5
These node scope variables should definitely be moved to moco-config. Helps keep node defs clean and put configurations in there proper place.
@@ +67,5 @@
> $slave_trustlevel = 'try'
> +
> + # Easy faster boot for bug 1393524
> + $puppet_run_if_more_than_seconds = 7200
> + $puppet_run_if_more_than_n_reboots = 5
Same as above. Move vars to moco-config. Don't forget to add defaults to config::base
@@ +1102,5 @@
> + include toplevel::worker::releng::generic_worker::test::gpu
> +}
> +
> +# Dsiabled Linux and OS X taskcluster workers in mdc1
> +#node /t.*-\d+\.test\.releng\.mdc1\.mozilla\.com/ {
Did you also comment out the rest of the node def here? This also might be bit rot since :arr enabled the osx tc workers in mdc1 last week
Attachment #8901385 -
Flags: review?(jwatkins) → review-
Assignee | ||
Comment 13•7 years ago
|
||
From test runs on t-yosemite-r7-471, this saves about a minute by skipping the puppet run and the wait for dns to resolve.
A few times from reboot requested to next task started (from attached log clips):
0:2:37 with puppet
0:2:33 with puppet
0:1:56
0:3:52 with puppet
0:1:37
0:2:38 with puppet
0:1:43
0:1:33
Assignee | ||
Comment 14•7 years ago
|
||
This moves 2 node scope vars to moco-config and adds them as explicitly undefined in config::base. Does this look right?
This adds a check for the boot flag file and removed the deletion of it from the puppet-run function (when the reboot is triggered, it was deleting the flag file. it also deletes it before puppet is run).
This also adds a condition to the start of the checks to, by default, bypass performing any checks. That seems better than expecting the tests to fail because the limits are not defined.
Attachment #8901385 -
Attachment is obsolete: true
Attachment #8903582 -
Flags: review?(jwatkins)
Comment 15•7 years ago
|
||
Comment on attachment 8903582 [details] [diff] [review]
include boot flag file check in osx run-puppet atboot conditions
Review of attachment 8903582 [details] [diff] [review]:
-----------------------------------------------------------------
I like it! Just fix the conditional stuff noted inline and r+. I
::: manifests/moco-config.pp
@@ +135,5 @@
> + # first test: limit to 393-399 in mdc1
> + /t-yosemite-r7-3\d+\.test\.releng\.mdc1\.mozilla\.com/: {
> + $puppet_run_atboot_if_more_than_n_reboots = 5
> + $puppet_run_atboot_if_more_than_seconds = 3600
> + }
You might need an empty default case here since the future lint config might complain about it
::: modules/config/manifests/base.pp
@@ +147,4 @@
> # a flag that controls which nodes should install and run MIG Agent
> $enable_mig_agent = false
>
> + # Conditional puppet run atboot for macs
Perfect!
::: modules/puppet/manifests/atboot.pp
@@ +13,5 @@
> $puppet_servers = $::puppet::settings::puppet_servers
>
> case ($::operatingsystem) {
> + Darwin: {
> + if ( $::config::puppet_run_atboot_if_more_than_n_reboots != undef or
First, I would add a comment about this conditional stating the reasoning. You should also be explicit in setting the true or false with an if/else since $puppet_run_conditions_set is a "true" or "false" string that is being inserted into a shell script. If it is UNDEF, the erb generator will set it as an empty string in the template. May work but that is not what you want in a shell script. :-) From my understanding, we want to check that BOTH variables ARE set and if either are undef, then the first conditional in the shell script will be true therefore short circuit the evaluation and puppet will always run. I don't think we would have a situation where only one var would be set. If this is an incorrect assumption, then let's talk.
I would suggest removing the negatives. Negatives tend to muddy up logic and make it hard to interpret.
eg.
if $puppet_run_atboot_if_more_than_n_reboots and $puppet_run_atboot_if_more_than_seconds {
$conditional_missing = false # both conditionals are set, let's continue evaluating
} else {
$conditional_missing = true # oh noes, conditional missing, always run puppet!
}
::: modules/puppet/templates/puppet-atboot-common.erb
@@ +21,4 @@
> # when the puppet exit status is incorrect.
> tmp=`mktemp /tmp/puppet-outputXXXXXX`
> . /usr/local/bin/proxy_reset_env.sh
> + /usr/bin/puppet agent --detailed-exitcodes $PUPPET_OPTIONS --server ${puppet_server} > $tmp 2>&1
Good catch!
::: modules/puppet/templates/puppet-darwin-run-puppet.sh.erb
@@ +94,5 @@
> +# boot semaphore is still set
> +# every N reboots
> +# or if N seconds elapsed since last puppet run
> +# or if the configuration(repo hash) has changed
> +if [ ! <%= @puppet_run_conditions_set %> ] || [ -f $REBOOT_FLAG_FILE ] || puppet_counter_exceeded || puppet_elapsed_time_exceeded || ! puppet_config_version_is_latest; then
If you explicitly set the first var either true or false in the origin logic, you can remove the confusing ! negative
Attachment #8903582 -
Flags: review?(jwatkins) → review+
Assignee | ||
Comment 16•7 years ago
|
||
Updated patch based on review comments: removed negatives in condition statements. always run unless both conditions are set (added comment explaining). clarified condition check by naming the variable for "always run"
Attachment #8903582 -
Attachment is obsolete: true
Attachment #8904853 -
Flags: review?(jwatkins)
Comment 17•7 years ago
|
||
Comment on attachment 8904853 [details] [diff] [review]
osx run-puppet atboot conditions
Review of attachment 8904853 [details] [diff] [review]:
-----------------------------------------------------------------
r+ with changes mentioned. You're on the right track! Much cleaner with all the inverse logic. Ping me with you want a quick r? on the changes. Let's get this landed
::: modules/puppet/manifests/atboot.pp
@@ +17,5 @@
> + # Always run puppet at boot unless maximums are set for both
> + # n_reboots and seconds since last puppet run.
> + if ( $::config::puppet_run_atboot_if_more_than_n_reboots and
> + $::config::puppet_run_atboot_if_more_than_seconds ) {
> + $puppet_run_atboot_always = false
treat $puppet_run_atboot_always as a string and single quote it for both "true" and "false"
::: modules/puppet/templates/puppet-darwin-run-puppet.sh.erb
@@ +90,5 @@
> + fi
> +}
> +
> +puppet_run_atboot_always=<%= @puppet_run_atboot_always %>
> +if [[ $puppet_run_atboot_always == true ]] \
No need to set this as a variable and no need to evaluate it will ==
Just treat is like a sting being inject with erb.
Should look like this:
if [[ $puppet_run_atboot_always ]] \
The result will look like:
if [[ true ]] \
or
if [[ false ]] \
@@ +94,5 @@
> +if [[ $puppet_run_atboot_always == true ]] \
> + || [ -f $REBOOT_FLAG_FILE ] \
> + || puppet_counter_exceeded \
> + || puppet_elapsed_time_exceeded \
> + || puppet_config_version_is_not_latest; then
Nice clean up! I love easy to read.
Attachment #8904853 -
Flags: review?(jwatkins) → review+
Comment 18•7 years ago
|
||
That should read, "Much cleaner withOUT all the inverse logic."
Assignee | ||
Comment 19•7 years ago
|
||
Updated based on comments
Attachment #8904853 -
Attachment is obsolete: true
Attachment #8905147 -
Flags: review?(jwatkins)
Assignee | ||
Comment 20•7 years ago
|
||
changed to single quotes for the always-run flag
Attachment #8905147 -
Attachment is obsolete: true
Attachment #8905147 -
Flags: review?(jwatkins)
Attachment #8905149 -
Flags: review?(jwatkins)
Comment 21•7 years ago
|
||
Comment on attachment 8905149 [details] [diff] [review]
osx run-puppet atboot conditions
Review of attachment 8905149 [details] [diff] [review]:
-----------------------------------------------------------------
LGTM. Ship IT!
Attachment #8905149 -
Flags: review?(jwatkins) → review+
Assignee | ||
Comment 22•7 years ago
|
||
Applied and pushed to default:
remote: https://hg.mozilla.org/build/puppet/rev/f69996fdd2d3c136a5818fabb99cd36da919839a
Travis failed lint. Fixed and pushed again:
remote: https://hg.mozilla.org/build/puppet/rev/ff8d6638a86c1199ce12b0c2e038848e6e1e9009
Travis passed. Pushed to production:
remote: https://hg.mozilla.org/build/puppet/rev/8631dd657139e9228a29b3f3f0f70fa462d58881
Attachment #8905149 -
Flags: checked-in+
Comment 23•7 years ago
|
||
See http://logs.glob.uno/?c=releng#c312233 - this might have possibly caused a linux talos puppet issue.
aobreja|buildduty is currently testing.
Assignee | ||
Comment 24•7 years ago
|
||
Postmortem summary doc: https://docs.google.com/document/d/1R78eciOMYbLpQsCnN1Nb7VywaBXHSnnmiG10CbUzM6A/edit
Assignee | ||
Comment 25•7 years ago
|
||
Expand mac puppet atboot conditional runs to all "t-yosemite-r7" instances.
Attachment #8908174 -
Flags: review?(jwatkins)
Attachment #8908174 -
Flags: review?(dcrisan)
Attachment #8908174 -
Flags: feedback?
Updated•7 years ago
|
Attachment #8908174 -
Flags: review?(dcrisan) → review+
Assignee | ||
Comment 26•7 years ago
|
||
We need to not check for the counter file. It will never exist unless we remove this check.
Attachment #8908252 -
Flags: review?(jwatkins)
Updated•7 years ago
|
Attachment #8908252 -
Flags: review?(jwatkins) → review+
Assignee | ||
Comment 27•7 years ago
|
||
Comment on attachment 8908252 [details] [diff] [review]
was checking for run count file too soon
Applied and pushed to default:
remote: https://hg.mozilla.org/build/puppet/rev/f7cfa8646cbde8421177b268f579ce6b80ecae50
Travis passed. pushed to production:
remote: https://hg.mozilla.org/build/puppet/rev/5ecfc7bf1c93e9906f9693a9bb663a2df66346d4
Attachment #8908252 -
Flags: checked-in+
Assignee | ||
Comment 28•7 years ago
|
||
Comment on attachment 8908174 [details] [diff] [review]
osx run-puppet atboot conditions
Applied and pushed to default:
remote: https://hg.mozilla.org/build/puppet/rev/f5cfa441c978f73bdf53cafa9125f1775fcdab4b
Travis passed. pushed to production:
remote: https://hg.mozilla.org/build/puppet/rev/5ecfc7bf1c93e9906f9693a9bb663a2df66346d4
Attachment #8908174 -
Flags: feedback? → checked-in+
Assignee | ||
Comment 29•7 years ago
|
||
often network/dns is not ready and so we cannot get the repo tip hash. This prevents us from skipping the puppet run as we're depending on getting the hash
Attachment #8908429 -
Flags: review?(jwatkins)
Attachment #8908429 -
Flags: review?(dcrisan)
Attachment #8908429 -
Attachment description: stop checking remote repo version. network/dns is often not ready (wait_for_dns ...) → wait_for_dns always. network/dns is often not ready for the hg tip hash check
Updated•7 years ago
|
Attachment #8908429 -
Flags: review?(jwatkins) → review+
Assignee | ||
Comment 30•7 years ago
|
||
Comment on attachment 8908429 [details] [diff] [review]
wait_for_dns always. network/dns is often not ready for the hg tip hash check
Applied and pushed to default:
remote: https://hg.mozilla.org/build/puppet/rev/bd4e4860d911867598a0f72cc8558753c9dccbfa
Travis passed.
Still didn't fix the hg response being empty *need to determine cause since it doesn't look like dns. Do not compare the hash if empty (bustage):
diff --git a/modules/puppet/templates/puppet-darwin-run-puppet.sh.erb b/modules/puppet/templates/puppet-darwin-run-puppet.sh.erb
--- a/modules/puppet/templates/puppet-darwin-run-puppet.sh.erb
+++ b/modules/puppet/templates/puppet-darwin-run-puppet.sh.erb
@@ -57,6 +57,7 @@ function puppet_config_version_is_not_la
# if hg fails or is not yet installed, this will leave a non-matching empty string
local repo_version=$(hg identify https://hg.mozilla.org/build/puppet)
echo "Repository tip hash: ${repo_version}"
+ [[ -z "${repo_version}" ]] && return 1
[[ "${last_run}" != "${repo_version}" ]]
}
remote: https://hg.mozilla.org/build/puppet/rev/b90518da429a5101c84565a4cf2e7e2eba73fd3d
Travis passed. Pushed the production:
remote: https://hg.mozilla.org/build/puppet/rev/d9b4a49865013673f1d5a92fe4cc731f8c377c44
I've collected reboot times (and whether they ran puppet) since the push and it is working correctly from what I see so far (and continuing on to run the generic_worker to take jobs) after reboots.
Attachment #8908429 -
Flags: review?(dcrisan) → checked-in+
Assignee | ||
Comment 31•7 years ago
|
||
The t-yosemite-r7 machines are skipping puppet runs atboot based on the conditions and that seems to cut the down-time for reboots down by 50-60 seconds.
t-yosemite-r7-0379 Sep 15 07:32:38, Sep 15 07:34:12, 94s, Creating file /etc/generic-worker.config...,
t-yosemite-r7-0214 Sep 15 07:32:38, Sep 15 07:34:13, 95s, Creating file /etc/generic-worker.config...,
t-yosemite-r7-0044 Sep 15 07:32:45, Sep 15 07:34:19, 94s, Creating file /etc/generic-worker.config...,
t-yosemite-r7-0059 Sep 15 07:32:01, Sep 15 07:34:26, 145s, Creating file /etc/generic-worker.config..., Sep 15 07:34:25
t-yosemite-r7-0112 Sep 15 07:32:04, Sep 15 07:34:33, 149s, Creating file /etc/generic-worker.config..., Sep 15 07:34:32
t-yosemite-r7-0106 Sep 15 07:32:04, Sep 15 07:34:33, 149s, Creating file /etc/generic-worker.config..., Sep 15 07:34:33
t-yosemite-r7-0060 Sep 15 07:33:07, Sep 15 07:34:42, 95s, Creating file /etc/generic-worker.config...,
t-yosemite-r7-0153 Sep 15 07:33:10, Sep 15 07:34:44, 94s, Creating file /etc/generic-worker.config...,
t-yosemite-r7-0333 Sep 15 07:32:25, Sep 15 07:34:48, 143s, Creating file /etc/generic-worker.config..., Sep 15 07:34:47
t-yosemite-r7-0346 Sep 15 07:33:16, Sep 15 07:34:51, 95s, Creating file /etc/generic-worker.config...,
t-yosemite-r7-0319 Sep 15 07:32:38, Sep 15 07:35:04, 146s, Creating file /etc/generic-worker.config..., Sep 15 07:35:03
t-yosemite-r7-0079 Sep 15 07:32:50, Sep 15 07:35:15, 145s, Creating file /etc/generic-worker.config..., Sep 15 07:35:13
t-yosemite-r7-0239 Sep 15 07:32:18, Sep 15 07:35:16, 178s, Creating file /etc/generic-worker.config..., Sep 15 07:35:15
So we'll see today if these minutes saved help the group of active servers to keep up with the queue.
Comment 32•7 years ago
|
||
Comment on attachment 8908174 [details] [diff] [review]
osx run-puppet atboot conditions
Review of attachment 8908174 [details] [diff] [review]:
-----------------------------------------------------------------
I'd prefer to stick to being explicit as much as possible in regex. The .* should be (scl3|mdc1|usw2|use1).
Attachment #8908174 -
Flags: review?(jwatkins) → review-
Assignee | ||
Comment 33•7 years ago
|
||
This: 1. Makes the fqdn matching specific to (scl3|mdc1|usw2|use1).
2. Fixes a counter bug: reset the boot counter every time we run puppet (not only when triggered by counter).
3. Adds logging for why puppet was run when a condition triggers it.
Attachment #8908174 -
Attachment is obsolete: true
Attachment #8909439 -
Flags: review?(jwatkins)
Attachment #8909439 -
Flags: review?(dcrisan)
Comment 34•7 years ago
|
||
Comment on attachment 8909439 [details] [diff] [review]
osx atboot conditions. logging, reset counter, specific fqdn
lgtm
Attachment #8909439 -
Flags: review?(dcrisan) → review+
Updated•7 years ago
|
Attachment #8909439 -
Flags: review?(jwatkins) → review+
Assignee | ||
Comment 35•7 years ago
|
||
Comment on attachment 8909439 [details] [diff] [review]
osx atboot conditions. logging, reset counter, specific fqdn
remote: https://hg.mozilla.org/build/puppet/rev/cbbcae5cd786c72a239dc2abc032464d858ac5ac
Travis passed. Pushed to production:
remote: https://hg.mozilla.org/build/puppet/rev/a85c4d35ca2fc8874f45ea7278b6ce850a8ef032
Attachment #8909439 -
Flags: checked-in+
Assignee | ||
Comment 36•7 years ago
|
||
Some totals from the reboot timing collected since the 20th:
```
day hosts reboots puppet-runs puppet-skips hours-saved potential-taskhours percent-saved
21 457 17769 7018 10751 179 10968 1.63
22 378 12904 5556 7348 122 9072 1.34
23 378 7617 4083 3534 58 9072 .63
24 378 8573 4370 4203 70 9072 .77
25 378 16041 6388 9653 160 9072 1.76
26 379 16711 6290 10421 173 9096 1.90
27 382 18107 6609 11498 191 9168 2.08
```
So on working week days, it looks like we are saving 1 to 2 percent of our mac processing time for tasks. I want to also sum up the time for reboots versus the time for puppet runs.
Comment 37•7 years ago
|
||
Did we reach to a conclusion on this bug?is there something else that need to be configured?
We have bug 1386625 blocked by this one.
Flags: needinfo?(dhouse)
Assignee | ||
Comment 38•7 years ago
|
||
We reached a stable state (and confirmed it gave use a few percentage of total host time). It is something we can revisit when/if we need to review host/ops time on the workers.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(dhouse)
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•