Closed Bug 712206 Opened 13 years ago Closed 10 years ago

slave pre-flight tasks (runner)

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P3)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: catlee, Assigned: mrrrgn)

References

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2714] [unittest][talos][automation][q4goal])

Attachments

(2 files, 5 obsolete files)

a lot of slave maintenance is done at the beginning of build and test jobs that could be done instead before even starting buildbot.

things like doing clobbers, purging old directories, making sure the tools checkout is up-to-date.

in the case where we have spare capacity in the pool this can be a slight win for end-to-end time since builds don't have to waste time cleaning up. in the case where we're maxed out, it's no worse than the current situation.

pre-flight checks could be added here as well (can I compile something? is my display correct?), and refuse to start buildbot if the checks fail.
Whiteboard: [unittest][talos]
Depends on: 596748
Priority: -- → P4
Whiteboard: [unittest][talos] → [unittest][talos][automation]
Component: Release Engineering → Release Engineering: Machine Management
QA Contact: release → armenzg
Among the things this should do is to verify and, if necessary, correct the screen resolution on all affected platforms.

I can see two ways to implement this:
 1. add functionality to runslave.py to run a directory full of scripts
 2. insert a step *before* runslave.py on all platforms that runs a directory full of scripts
Blocks: 801607
I think we need this soon
Priority: P4 → P2
I can help with the Windows side as I have increased my expertise by setting the Windows 8, 7 and 64-bit several times in the last two quarters.
I'm thinking that we should add bad slaves in the "see also" field and keep track in which weird ways it starts burning jobs even if hardware diagnostics did not find any problem.
See Also: → talos-r4-lion-063
I would also add the ability to report issues into a dashboard.
I would like to avoid having to go and check each single slave when it fails to start.
Component: Release Engineering: Machine Management → Release Engineering: Platform Support
QA Contact: armenzg → coop
Product: mozilla.org → Release Engineering
We need to clean temporary files on bld-lion machines like on bug 906706.
Blocks: 880003
It would be great if we had a script that would help us determine if the right graphic card is installed on the win7 machines: see bug https://bugzilla.mozilla.org/show_bug.cgi?id=873566#c24
An approach to slave pre-flight tasks for puppet:
https://github.com/catlee/puppet/compare/master...startup
Very cool!  I can't comment on the "compare", so a few notes here:

* /etc/preflight.d should be purged, so that puppet can remove no-longer-desired scripts

* the reboot should be delayed or, given the presence of slave-health/kittenherder, maybe just hang forever on the assumption it will be rebooted externally?  At any rate, trying to fix something that's rebooting every 30s is no fun at all.

* As for starting Buildbot, it might be easiest to just invoke this script from the various shell scripts used for puppet::atboot, since they already have a mechanism to delay buildbot startup.

* It'd be nicer to have preflight::step take a 'contents' attribute.  Then the in-module classes like preflight::steps::tools can use contents => template("${module_name}/tools.erb"), and preflight::step can also be used outside of the preflight module, with arbitrary scripts.

* Really minor, but grammatically I'd prefer preflight::step::tools (singular), just for the symmetry with preflight::step.

* It'd be great to have additional logging from these scripts, both to syslog and, potentially, to nagios via NSCA.

* Given the necessity of running these scripts on all platforms, would it be better to write them in Python or Ruby?  That would also allow libraries for platform-independent logging, etc.

I'm sad that we've been sitting on this since May :(
Assignee: nobody → armenzg
Whiteboard: [unittest][talos][automation] → [unittest][talos][automation][q4goal]
I would prefer writing this in python to make it cross-platform compatible.
It is also easier to have people contribute to it.

I assume there will be some Puppet and GPO to deploy some part of it.

I will do an analysis to see where to hook this up for each OS.
I can't look at this bug until past the work week.
I'm afraid this will be a missed goal.
Priority: P2 → P3
Depends on: 930831
Simone: how do you feel about owning the whole thing, and not just the tools repo part?
Flags: needinfo?(sbruno)
Coop: it surely makes sense since I started working on Bug 712205. I'm taking this!
Assignee: armenzg → sbruno
Flags: needinfo?(sbruno)
(In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo me) from comment #10)
> * As for starting Buildbot, it might be easiest to just invoke this script
> from the various shell scripts used for puppet::atboot, since they already
> have a mechanism to delay buildbot startup.

+1

> * It'd be great to have additional logging from these scripts, both to
> syslog and, potentially, to nagios via NSCA.

I think we handle this. Windows will still suck.
 
> * Given the necessity of running these scripts on all platforms, would it be
> better to write them in Python or Ruby?  That would also allow libraries for
> platform-independent logging, etc.

Yes, I think we want python scripts/libs here.
Some other notes that have come out of conversations with releng this week:

* measuring improvement: we track current setup and teardown times per OS in brasstacks (http://brasstacks.mozilla.com/gofaster/#/ but the charts seem to be busted right now,  I'll investigate). We should gather baseline stats now, so we can assess whether (and how far, hopefully) we've moved the needle at the end

* incremental gains: once we have our baselines, we can and should roll this out per OS as each step is ready, rather than waiting for a giant improvement across the board at the end. Making things a little bit better every week, even on one platform at a time, is progress.

* specific tasks:

** clobberer should run at start-up, although we will still need to check clobberer as part of the build, but only for the tree in question. Depending on how long a slave sits idle before its first job, there's a chance that a clobber request will come in between buildbot startup and job start.

** hg-shared: we should iterate through all the trees in the hg-shared dir and make sure they're updated to tip before starting buildbot. This will give jobs the minimum possible code delta to pull during the actual build process.

** moving post-flight tasks to pre-flight: obviously this makes sense too, but may require some special logic. Some of the post-flight tasks we do are specific to the current build, e.g. removing build and/or objdirs. Moving those tasks to pre-flight means you wouldn't have a current build as reference. Maybe post-flight tasks get merged in a later round of improvements here. I'm hoping better build system dependencies make this a non-issue in the near future.
** one pony, with rainbow hair

Let's get the basic framework in place before we start hammering every nail!
Depends on: 932877
(In reply to Chris Cooper [:coop] from comment #16)
> * measuring improvement: we track current setup and teardown times per OS in
> brasstacks (http://brasstacks.mozilla.com/gofaster/#/ but the charts seem to
> be busted right now,  I'll investigate). 

Filed bug 932877.
Depends on: 769384
I am going to write a preflight_runner.py script, which will be invoked just before running runslave.py in the following files, which reside in http://hg.mozilla.org/build/puppet/file/default/modules/buildslave/files:

darwin-run-buildslave.sh
run-puppet-and-buildbot.sh
start-buildbot-win64.bat
startTalos-w7.bat
startTalos-w764.bat
startTalos-w864.bat
startTalos-xp.bat

Using these files as hook points for preflight calls will make it possible to roll out preflight scripts per OS, as proposed by coop.

preflight_runner.py will run a directory containing scripts, determining the order of execution based on some simple naming conventions.

I will start implementing support for python scripts, but it should not be difficult to extend to direct support of native shell scripts (.sh scripts for unix, .bat/.cmd scripts for Windows).

Armen, can you please confirm that this plan is compatible with our last conversation?
Flags: needinfo?(armenzg)
First of all, could you please go to each Windows host and grab a recent version of start*.bat?
What is committed puppet could be easily be out-of-date since we have refactor the Windows infrastructure with GPO and have changed a bunch of assumptions.

This matches what we spoke.
I have no problem making the script more OS-centric, however, I would *suggest* not mixing them in the same directory but split by OS (e.g. win_preflight_tasks/, unix_preflight_tasks/ and osx_preflight_tasks/).
Flags: needinfo?(armenzg)
Hi Armen,

Thanks for your quick feedback.

I will have a look to the actual .bat files on windows hosts - even if there are differences, I believe the principle of using these files as hook points before runslave.py is called remains valid.

Regarding the preflight tasks, I completely agree with your recommendation; they will necessarily be written and stored separately for each OS, at least for two reasons:

1) because we may want to add different behavior per OS
2) because we may want to support native shell scripts
The scripts will, presumably, be installed by puppet, so I would recommend having the per-OS (or per-silo, really) logic handled in puppet, not by the runner script.
I'd like to shift focus slightly for this. Instead of running pre-flight tasks on boot, we should run them before running buildbot. A minor distinction right now, but I'd like to get to the point where we're running some kind of loop like:

while True:
- run pre-flight tasks
- run buildbot for one job
- run post-flight tasks

then we can stop rebooting machines after every job.
Depends on: 1005936
We're going to do this with runner:
https://github.com/catlee/runner
Assignee: sbruno → catlee
Blocks: 1019013
Attached patch runner.diff (obsolete) — Splinter Review
Puppet module for installing and configuring Runner. Runner itself is already packaged as an sdist and uploaded. Unsure if :catlee wants a review on runner itself or not.
Attachment #8436062 - Flags: review?(dustin)
Attachment #8436062 - Flags: feedback?(catlee)
Comment on attachment 8436062 [details] [diff] [review]
runner.diff

Review of attachment 8436062 [details] [diff] [review]:
-----------------------------------------------------------------

It looks like this has a bunch of other stuff mixed in.  Can you re-post a diff containing only your changes?
Attachment #8436062 - Flags: review?(dustin)
A diff not created on an brain-addled Friday evening...
Attachment #8436062 - Attachment is obsolete: true
Attachment #8436062 - Flags: feedback?(catlee)
Attachment #8437030 - Flags: review?(dustin)
Attachment #8437030 - Flags: feedback?(catlee)
Comment on attachment 8437030 [details] [diff] [review]
Puppet module for Catlee's Runner

Review of attachment 8437030 [details] [diff] [review]:
-----------------------------------------------------------------

I think we talked a bit in irc about this, but just to follow up, it seems like dependencies are going to be the gotcha here, especially if this is going to be useful for other organizations (e.g., QA probably has some pre-flight tasks, but they're probably not very similar to releng's).

Using sorted, numbered scripts works OK if different numbers have meaning (e.g., "all repositories are up to date by level 10"), but they don't represent dependencies very well.  For example, if I rename 15-frobnicate.sh to 17-frobnicate.sh, how do I know that 16-barterize.sh didn't depend on it?

Since the runner itself is Python, it'd actually be pretty easy to specify dependencies explicitly, either in the scripts themselves (perhaps in comments) or in some nearby file (e.g., barterize.deps), and then flatten that dependency tree into a fixed order in the Python script.  That would also leave the door open to parallelizing the preflight tasks at a later date.  And if those dependencies are reflected in the Puppet configuration, then Puppet can ensure that every task's dependency tasks are also installed.

Systemd, launchd, and Gentoo all solve this a little bit differently, and they all have complicated syntax to say 'starts before this service *or* that service' or to refer to virtual services or whatever.  I don't see runner needing such levels of complexity.

This, combined with the config handling I suggest below, would mean that toplevel::slave::build (or toplevel::slave::qa::tps_ci) only needs to have

  include runner::task::update_shared_repos
  include runner::task::clean _tempfiles

and all of the reqiured configuration, dependent tasks, and runner itself would be installed automatically.

::: modules/runner/files/checkout_tools
@@ +6,5 @@
> +        exit $?
> +fi
> +HGTOOL=/usr/local/bin/hgtool.py
> +
> +TOOLS_REPO=$($RUNNER_CONFIG_CMD -g hg.tools_repo)

I don't see $RUNNER_CONFIG_CMD defined anywhere..

::: modules/runner/manifests/service.pp
@@ +6,5 @@
> +    include runner::settings
> +    case $::operatingsystem {
> +        'CentOS': {
> +            # Which service this relies on
> +            $initd_required_start = 'network'

What's this for?  Would this change?

::: modules/toplevel/manifests/slave/build.pp
@@ -33,5 @@
>      ccache::ccache_dir {
> -        "/builds/ccache":
> -            maxsize => "10G",
> -            owner => $users::builder::username,
> -            group => $users::builder::group;

Looks like you have some kind of automatic puppetlint thing going on?  It's OK for files you've modified, but not every file..

::: modules/toplevel/manifests/slave/build/mock.pp
@@ -17,5 @@
>      # good way to communicate the need to that class.
>      exec {
>          'add-builder-to-mock_mozilla':
> -            command => "/usr/bin/gpasswd -a $users::builder::username mock_mozilla",
> -            unless => "/usr/bin/groups $users::builder::username | grep '\\<mock_mozilla\\>'",

The addition of {..} here is legit, but we don't follow the style of lining up the => in a single column.

@@ +32,5 @@
> +            data        => template('toplevel/slave/build/hg.cfg.erb');
> +        'buildbot.cfg':
> +            sectionname => 'buildbot',
> +            data        => template('toplevel/slave/build/buildbot.cfg.erb');
> +    }

I think there are bits of the config stuff missing in this patch, so maybe I'm missing something, but -- why have all of these config files?  And it looks like each just has a single section?  You could use the 'concat' puppet module to concatenate those into a single file and make things easier for users to read in place.

From what I can tell, the config's are orthogonal to the tasks - -meaning that a task may use several config's, and several tasks may use a config.  What if each task included the configs it needed?

So runner::tasks::checkout_tools would have
  include runner::config::hg
and runner::tasks::update_shared_repos would have
  include runner::config::hg
  include runner::config::env
Attachment #8437030 - Flags: review?(dustin) → feedback+
(In reply to Dustin J. Mitchell [:dustin] from comment #30)
> Comment on attachment 8437030 [details] [diff] [review]
> Puppet module for Catlee's Runner
> 
> Review of attachment 8437030 [details] [diff] [review]:
> -----------------------------------------------------------------

Thanks a bunch for taking the time to review this so thoroughly, Dustin!

> I think we talked a bit in irc about this, but just to follow up, it seems
> like dependencies are going to be the gotcha here, especially if this is
> going to be useful for other organizations (e.g., QA probably has some
> pre-flight tasks, but they're probably not very similar to releng's).
> 
> Using sorted, numbered scripts works OK if different numbers have meaning
> (e.g., "all repositories are up to date by level 10"), but they don't
> represent dependencies very well.  For example, if I rename 15-frobnicate.sh
> to 17-frobnicate.sh, how do I know that 16-barterize.sh didn't depend on it?
> 
> Since the runner itself is Python, it'd actually be pretty easy to specify
> dependencies explicitly, either in the scripts themselves (perhaps in
> comments) or in some nearby file (e.g., barterize.deps), and then flatten
> that dependency tree into a fixed order in the Python script.  That would
> also leave the door open to parallelizing the preflight tasks at a later
> date.  And if those dependencies are reflected in the Puppet configuration,
> then Puppet can ensure that every task's dependency tasks are also installed.
> 
> Systemd, launchd, and Gentoo all solve this a little bit differently, and
> they all have complicated syntax to say 'starts before this service *or*
> that service' or to refer to virtual services or whatever.  I don't see
> runner needing such levels of complexity.
> 
> This, combined with the config handling I suggest below, would mean that
> toplevel::slave::build (or toplevel::slave::qa::tps_ci) only needs to have
> 
>   include runner::task::update_shared_repos
>   include runner::task::clean _tempfiles
> 
> and all of the reqiured configuration, dependent tasks, and runner itself
> would be installed automatically.

I need to grab :catlee to discuss this today. Will get back to you with my thoughts after that. 


> 
> ::: modules/runner/files/checkout_tools
> @@ +6,5 @@
> > +        exit $?
> > +fi
> > +HGTOOL=/usr/local/bin/hgtool.py
> > +
> > +TOOLS_REPO=$($RUNNER_CONFIG_CMD -g hg.tools_repo)
> 
> I don't see $RUNNER_CONFIG_CMD defined anywhere..

Runner sets this itself (and guarantees it being set). 

Definitely worthy of a comment though. Will fix.


> 
> ::: modules/runner/manifests/service.pp
> @@ +6,5 @@
> > +    include runner::settings
> > +    case $::operatingsystem {
> > +        'CentOS': {
> > +            # Which service this relies on
> > +            $initd_required_start = 'network'
> 
> What's this for?  Would this change?

Good question. Will find out if still needed

> ::: modules/toplevel/manifests/slave/build.pp
> @@ -33,5 @@
> >      ccache::ccache_dir {
> > -        "/builds/ccache":
> > -            maxsize => "10G",
> > -            owner => $users::builder::username,
> > -            group => $users::builder::group;
> 
> Looks like you have some kind of automatic puppetlint thing going on?  It's
> OK for files you've modified, but not every file..

Oh, damn. Apologies. 
> 
> ::: modules/toplevel/manifests/slave/build/mock.pp
> @@ -17,5 @@
> >      # good way to communicate the need to that class.
> >      exec {
> >          'add-builder-to-mock_mozilla':
> > -            command => "/usr/bin/gpasswd -a $users::builder::username mock_mozilla",
> > -            unless => "/usr/bin/groups $users::builder::username | grep '\\<mock_mozilla\\>'",
> 
> The addition of {..} here is legit, but we don't follow the style of lining
> up the => in a single column.

Will know for the future - thanks!


> 
> @@ +32,5 @@
> > +            data        => template('toplevel/slave/build/hg.cfg.erb');
> > +        'buildbot.cfg':
> > +            sectionname => 'buildbot',
> > +            data        => template('toplevel/slave/build/buildbot.cfg.erb');
> > +    }
> 
> I think there are bits of the config stuff missing in this patch, so maybe
> I'm missing something, but -- why have all of these config files?  And it
> looks like each just has a single section?  You could use the 'concat'
> puppet module to concatenate those into a single file and make things easier
> for users to read in place.
> 
> From what I can tell, the config's are orthogonal to the tasks - -meaning
> that a task may use several config's, and several tasks may use a config. 
> What if each task included the configs it needed?
> 
> So runner::tasks::checkout_tools would have
>   include runner::config::hg
> and runner::tasks::update_shared_repos would have
>   include runner::config::hg
>   include runner::config::env

There doesn't seem to be any config missing. Is there something in particularly that made you think that that I may have forgotten?

And that's a good point on being able to `include` the configs. I'll look into that today.

Thanks again.
RUNNER_CONFIG_CMD made me think something was missing, but I see the error of my ways now.
Assignee: catlee → iconnolly
Comment on attachment 8437030 [details] [diff] [review]
Puppet module for Catlee's Runner

Review of attachment 8437030 [details] [diff] [review]:
-----------------------------------------------------------------

We also spoke on IRC and vidyo about this.

I like the idea of moving the configs into their own modules, related to which tasks are defined.

Let's leave inter-task dependencies for the next phase.
Attachment #8437030 - Flags: feedback?(catlee) → feedback+
Attached patch post-feedback diff (obsolete) — Splinter Review
Updated runner module post-feedback with Dustin.

Just need a quick look at the config changes to make sure what I envisaged was aligned with what :dustin was thinking. 

Left inter-task dependencies to iteration n+1.
Attachment #8437030 - Attachment is obsolete: true
Attachment #8439356 - Flags: review?(dustin)
Comment on attachment 8439356 [details] [diff] [review]
post-feedback diff

Review of attachment 8439356 [details] [diff] [review]:
-----------------------------------------------------------------

All pretty minor comments here -- nice work!

::: modules/runner/files/checkout_tools
@@ +1,1 @@
> +#!/bin/bash

Minor, but these files should have license headers on them

@@ +7,5 @@
> +fi
> +HGTOOL=/usr/local/bin/hgtool.py
> +
> +# $RUNNER_CONFIG_CMD is guaranteed to be set by Runner itself. See Runner
> +# documentation for more information.

Thanks :)

::: modules/runner/manifests/config.pp
@@ +1,5 @@
> +# This Source Code Form is subject to the terms of the Mozilla Public
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +# config file for runner
> +define runner::config($sectionname, $data) {

This appears unused now - remove?

::: modules/runner/manifests/service.pp
@@ +6,5 @@
> +    include runner::settings
> +    case $::operatingsystem {
> +        'CentOS': {
> +            # Which service this relies on
> +            $initd_required_start = 'network'

This is still hanging around..

::: modules/runner/manifests/tasks/checkout_tools.pp
@@ +3,5 @@
> +    file {
> +        '/tools/checkouts':
> +             ensure => directory,
> +             owner  => cltbld,
> +             group  => cltbld;

replace cltbld with $::config::builder_username

@@ +4,5 @@
> +        '/tools/checkouts':
> +             ensure => directory,
> +             owner  => cltbld,
> +             group  => cltbld;
> +     }

one space too many

::: modules/toplevel/manifests/slave/build/mock.pp
@@ +22,4 @@
>              require => [Class['packages::mozilla::mock_mozilla'], Class['users::builder']];
>      }
> +
> +    include runner

If you make each of the runner::tasks::* include runner, then you don't even need this here.  That might be a bit easier.
Attachment #8439356 - Flags: review?(dustin) → review+
(In reply to Dustin J. Mitchell [:dustin] from comment #36)
> Comment on attachment 8439356 [details] [diff] [review]
> post-feedback diff
> 
> Review of attachment 8439356 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> All pretty minor comments here -- nice work!

Thanks!

> 
> ::: modules/runner/files/checkout_tools
> @@ +1,1 @@
> > +#!/bin/bash
> 
> Minor, but these files should have license headers on them
> 

Fixed!


> 
> ::: modules/runner/manifests/config.pp
> @@ +1,5 @@
> > +# This Source Code Form is subject to the terms of the Mozilla Public
> > +# License, v. 2.0. If a copy of the MPL was not distributed with this
> > +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> > +# config file for runner
> > +define runner::config($sectionname, $data) {
> 
> This appears unused now - remove?

Done.

> 
> ::: modules/runner/manifests/service.pp
> @@ +6,5 @@
> > +    include runner::settings
> > +    case $::operatingsystem {
> > +        'CentOS': {
> > +            # Which service this relies on
> > +            $initd_required_start = 'network'
> 
> This is still hanging around..

Oops! My bad, I had already removed it but I must have generated the diff beforehand. Fixed.

> 
> ::: modules/runner/manifests/tasks/checkout_tools.pp
> @@ +3,5 @@
> > +    file {
> > +        '/tools/checkouts':
> > +             ensure => directory,
> > +             owner  => cltbld,
> > +             group  => cltbld;
> 
> replace cltbld with $::config::builder_username

Fixed.

> 
> @@ +4,5 @@
> > +        '/tools/checkouts':
> > +             ensure => directory,
> > +             owner  => cltbld,
> > +             group  => cltbld;
> > +     }
> 
> one space too many

Annoyed the linter didn't catch that. Thanks.

> 
> ::: modules/toplevel/manifests/slave/build/mock.pp
> @@ +22,4 @@
> >              require => [Class['packages::mozilla::mock_mozilla'], Class['users::builder']];
> >      }
> > +
> > +    include runner
> 
> If you make each of the runner::tasks::* include runner, then you don't even
> need this here.  That might be a bit easier.

Good idea. Done.
Could I get a check-in? :)

Carrying the r+ from :dustin over from Attachment 8439356 [details] [diff].
Attachment #8439356 - Attachment is obsolete: true
Attachment #8441600 - Flags: review+
Attachment #8441600 - Flags: checked-in?
Comment on attachment 8441600 [details] [diff] [review]
runner.diff -- Changes made as per Dustin's 2nd review

remote:   https://hg.mozilla.org/build/puppet/rev/58a3c0e9ae64
remote:   https://hg.mozilla.org/build/puppet/rev/6fa2dc80d46a
Attachment #8441600 - Flags: checked-in? → checked-in+
Comment on attachment 8441600 [details] [diff] [review]
runner.diff -- Changes made as per Dustin's 2nd review

we didn't end up with the proper boot order for some reason
Attachment #8441600 - Flags: checked-in+ → checked-in-
Blocks: 1028191
Attached patch Runner - with boot order fixes (obsolete) — Splinter Review
This should fix the boot order problems we were having previously. 

The problems were:

a) We had rc3.d/ symlinks for puppet and buildbot already on ours existing machines. We were changing their boot priority in their init.d/ scripts, but the symlinks weren't being updated to reflect that. Puppet doesn't do this as part of its Service type unfortunately, but I've run an external command to reset these symlinks. (This was all working fine on *new* machines because the symlinks had the correct priority value if they were newly created)

b) Another issue was that we were moving new services to after puppet, that weren't currently in the boot order for after puppet (ie. we were changing puppet's boot priority while it was running from its initscript). This has been fixed by getting puppet for force a reboot whenever its initscript is changed. 

These were the known issues that arose from the last deployment. Upon a reboot of a server configured with this patch we correctly see puppet -> runner -> buildbot so hopefully that means we're good.
Attachment #8441600 - Attachment is obsolete: true
Attachment #8443701 - Flags: review?(dustin)
Attachment #8443701 - Flags: checked-in?
Comment on attachment 8443701 [details] [diff] [review]
Runner - with boot order fixes

Review of attachment 8443701 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM. Just a couple of trailing space nazi comments below. :)

I can fix them when I land the patch.

::: manifests/moco-config.pp
@@ +225,4 @@
>      $selfserve_agent_carrot_exchange = "buildapi.control"
>      $selfserve_agent_carrot_queue = "buildapi-agent-rabbit2"
>  
> +

Can you remove the added empty line above?

::: modules/puppet/manifests/atboot.pp
@@ +27,5 @@
>      exec {
>          # ask the puppet startup script to reboot
>          "reboot-after-puppet":
> +            command => "touch /REBOOT_AFTER_PUPPET",
> +            path => ['/bin/', '/usr/bin/'],

From IRC, "touch" lives in /bin on CentOS. Good catch!

@@ +61,5 @@
> +                    # rc3.d symlink and we want to reboot because we've changed
> +                    # what we want to come after puppet in the boot order
> +                    notify => [ Exec['initd-refresh'], Exec['reboot-after-puppet'] ];
> +            }
> +            

.. and here

@@ +65,5 @@
> +            
> +            exec {
> +                'initd-refresh':
> +                    # resetpriorities tells chkconfig to update the 
> +                    # symlinks in rcX.d with the values from the service's 

and on these 2 lines

@@ +68,5 @@
> +                    # resetpriorities tells chkconfig to update the 
> +                    # symlinks in rcX.d with the values from the service's 
> +                    # init.d script
> +                    command => '/sbin/chkconfig puppet resetpriorities',
> +                    refreshonly => true;    

And here
(trailing spaces nazi mode off)

::: modules/puppet/templates/puppet-centos-initd.erb
@@ +8,5 @@
> +### BEGIN INIT INFO
> +# Provides:          puppet
> +# Required-Start:    $local_fs $network
> +# Required-Start:    $local_fs $network
> +# Should-Start:      $remote_fs 

Ooop, and here
Attachment #8443701 - Flags: review?(dustin) → review+
Whitespace fixes from Rail's review.

Carrying over the r+.
Attachment #8443701 - Attachment is obsolete: true
Attachment #8443701 - Flags: checked-in?
Attachment #8445296 - Flags: review+
Attachment #8445296 - Flags: checked-in?
Attachment #8445296 - Flags: checked-in? → checked-in+
We're sticking with this - will pull fix out into new bug.
Attachment #8445296 - Attachment is obsolete: false
Depends on: 1029704
Depends on: 1029777
Depends on: 1029903
Blocks: 851123
Summary: slave pre-flight tasks → slave pre-flight tasks (runner)
Depends on: 1069529
Depends on: 1069530
Whiteboard: [unittest][talos][automation][q4goal] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2706] [unittest][talos][automation][q4goal]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2706] [unittest][talos][automation][q4goal] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2714] [unittest][talos][automation][q4goal]
Assignee: ian → winter2718
It seems like this is essentially finished. I'm starting on getting rid of post-build reboots; will open bugs for any additional runner tasks that need creating catch-as-catch-can.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: