Closed Bug 712206 Opened 13 years ago Closed 10 years ago

slave pre-flight tasks (runner)

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task, P3)

Product:

Component:

Type:

task

Priority:

P3

Severity:

normal

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: catlee, Assigned: mrrrgn)

References

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2714] [unittest][talos][automation][q4goal])

Attachments

(2 files, 5 obsolete files)

runner.diff 11 years ago Ian Connolly [:ianconnolly] 22.43 KB, patch		Details \| Diff \| Splinter Review
Puppet module for Catlee's Runner 11 years ago Ian Connolly [:ianconnolly] 15.48 KB, patch	catlee : feedback+ dustin : feedback+	Details \| Diff \| Splinter Review
post-feedback diff 11 years ago Ian Connolly [:ianconnolly] 15.03 KB, patch	dustin : review+	Details \| Diff \| Splinter Review
runner.diff -- Changes made as per Dustin's 2nd review 11 years ago Ian Connolly [:ianconnolly] 15.83 KB, patch	ianconnolly : review+ catlee : checked-in-	Details \| Diff \| Splinter Review
Runner - with boot order fixes 11 years ago Ian Connolly [:ianconnolly] 17.66 KB, patch	rail : review+	Details \| Diff \| Splinter Review
cleaned runner.diff 11 years ago Ian Connolly [:ianconnolly] 17.28 KB, patch	ianconnolly : review+ catlee : checked-in+	Details \| Diff \| Splinter Review
fix for subset of machines that got the backed out patch 11 years ago Ian Connolly [:ianconnolly] 18.59 KB, patch		Details \| Diff \| Splinter Review

Chris AtLee [:catlee]

Reporter

Description

•

13 years ago

a lot of slave maintenance is done at the beginning of build and test jobs that could be done instead before even starting buildbot. things like doing clobbers, purging old directories, making sure the tools checkout is up-to-date. in the case where we have spare capacity in the pool this can be a slight win for end-to-end time since builds don't have to waste time cleaning up. in the case where we're maxed out, it's no worse than the current situation. pre-flight checks could be added here as well (can I compile something? is my display correct?), and refuse to start buildbot if the checks fail.

Armen [:armenzg]

Updated

•

13 years ago

Whiteboard: [unittest][talos]

Chris AtLee [:catlee]

Reporter

Updated

•

13 years ago

Depends on: 596748

Chris Cooper [:coop] (he/him)

Updated

•

13 years ago

Priority: -- → P4

Whiteboard: [unittest][talos] → [unittest][talos][automation]

Chris AtLee [:catlee]

Reporter

Updated

•

13 years ago

Component: Release Engineering → Release Engineering: Machine Management

QA Contact: release → armenzg

Dustin J. Mitchell [:dustin] (he/him)

Comment 2

•

13 years ago

Among the things this should do is to verify and, if necessary, correct the screen resolution on all affected platforms. I can see two ways to implement this: 1. add functionality to runslave.py to run a directory full of scripts 2. insert a step *before* runslave.py on all platforms that runs a directory full of scripts

Chris AtLee [:catlee]

Reporter

Updated

•

12 years ago

Blocks: 801607

Chris AtLee [:catlee]

Reporter

Comment 3

•

12 years ago

I think we need this soon

Priority: P4 → P2

Armen [:armenzg]

Comment 4

•

12 years ago

I can help with the Windows side as I have increased my expertise by setting the Windows 8, 7 and 64-bit several times in the last two quarters.

Armen [:armenzg]

Comment 5

•

12 years ago

I'm thinking that we should add bad slaves in the "see also" field and keep track in which weird ways it starts burning jobs even if hardware diagnostics did not find any problem.

See Also: → talos-r4-lion-063

Armen [:armenzg]

Comment 6

•

12 years ago

I would also add the ability to report issues into a dashboard. I would like to avoid having to go and check each single slave when it fails to start.

Chris Cooper [:coop] (he/him)

Updated

•

12 years ago

Component: Release Engineering: Machine Management → Release Engineering: Platform Support

QA Contact: armenzg → coop

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

11 years ago

Blocks: 906706

Armen [:armenzg]

Comment 7

•

11 years ago

We need to clean temporary files on bld-lion machines like on bug 906706.

Armen [:armenzg]

Updated

•

11 years ago

Blocks: 880003

Armen [:armenzg]

Comment 8

•

11 years ago

It would be great if we had a script that would help us determine if the right graphic card is installed on the win7 machines: see bug https://bugzilla.mozilla.org/show_bug.cgi?id=873566#c24

Armen [:armenzg]

Comment 9

•

11 years ago

An approach to slave pre-flight tasks for puppet: https://github.com/catlee/puppet/compare/master...startup

Dustin J. Mitchell [:dustin] (he/him)

Comment 10

•

11 years ago

Very cool! I can't comment on the "compare", so a few notes here: * /etc/preflight.d should be purged, so that puppet can remove no-longer-desired scripts * the reboot should be delayed or, given the presence of slave-health/kittenherder, maybe just hang forever on the assumption it will be rebooted externally? At any rate, trying to fix something that's rebooting every 30s is no fun at all. * As for starting Buildbot, it might be easiest to just invoke this script from the various shell scripts used for puppet::atboot, since they already have a mechanism to delay buildbot startup. * It'd be nicer to have preflight::step take a 'contents' attribute. Then the in-module classes like preflight::steps::tools can use contents => template("${module_name}/tools.erb"), and preflight::step can also be used outside of the preflight module, with arbitrary scripts. * Really minor, but grammatically I'd prefer preflight::step::tools (singular), just for the symmetry with preflight::step. * It'd be great to have additional logging from these scripts, both to syslog and, potentially, to nagios via NSCA. * Given the necessity of running these scripts on all platforms, would it be better to write them in Python or Ruby? That would also allow libraries for platform-independent logging, etc. I'm sad that we've been sitting on this since May :(

Chris Cooper [:coop] (he/him)

Updated

•

11 years ago

Assignee: nobody → armenzg

Whiteboard: [unittest][talos][automation] → [unittest][talos][automation][q4goal]

Armen [:armenzg]

Comment 11

•

11 years ago

I would prefer writing this in python to make it cross-platform compatible. It is also easier to have people contribute to it. I assume there will be some Puppet and GPO to deploy some part of it. I will do an analysis to see where to hook this up for each OS.

Armen [:armenzg]

Comment 12

•

11 years ago

I can't look at this bug until past the work week. I'm afraid this will be a missed goal.

Priority: P2 → P3

Chris AtLee [:catlee]

Reporter

Updated

•

11 years ago

Depends on: 930831

Chris Cooper [:coop] (he/him)

Comment 13

•

11 years ago

Simone: how do you feel about owning the whole thing, and not just the tools repo part?

Flags: needinfo?(sbruno)

Simone Bruno [:simone]

Comment 14

•

11 years ago

Coop: it surely makes sense since I started working on Bug 712205. I'm taking this!

Assignee: armenzg → sbruno

Flags: needinfo?(sbruno)

Chris Cooper [:coop] (he/him)

Comment 15

•

11 years ago

(In reply to Dustin J. Mitchell [:dustin] (I read my bugmail; don't needinfo me) from comment #10) > * As for starting Buildbot, it might be easiest to just invoke this script > from the various shell scripts used for puppet::atboot, since they already > have a mechanism to delay buildbot startup. +1 > * It'd be great to have additional logging from these scripts, both to > syslog and, potentially, to nagios via NSCA. I think we handle this. Windows will still suck. > * Given the necessity of running these scripts on all platforms, would it be > better to write them in Python or Ruby? That would also allow libraries for > platform-independent logging, etc. Yes, I think we want python scripts/libs here.

Chris Cooper [:coop] (he/him)

Comment 16

•

11 years ago

Some other notes that have come out of conversations with releng this week: * measuring improvement: we track current setup and teardown times per OS in brasstacks (http://brasstacks.mozilla.com/gofaster/#/ but the charts seem to be busted right now, I'll investigate). We should gather baseline stats now, so we can assess whether (and how far, hopefully) we've moved the needle at the end * incremental gains: once we have our baselines, we can and should roll this out per OS as each step is ready, rather than waiting for a giant improvement across the board at the end. Making things a little bit better every week, even on one platform at a time, is progress. * specific tasks: ** clobberer should run at start-up, although we will still need to check clobberer as part of the build, but only for the tree in question. Depending on how long a slave sits idle before its first job, there's a chance that a clobber request will come in between buildbot startup and job start. ** hg-shared: we should iterate through all the trees in the hg-shared dir and make sure they're updated to tip before starting buildbot. This will give jobs the minimum possible code delta to pull during the actual build process. ** moving post-flight tasks to pre-flight: obviously this makes sense too, but may require some special logic. Some of the post-flight tasks we do are specific to the current build, e.g. removing build and/or objdirs. Moving those tasks to pre-flight means you wouldn't have a current build as reference. Maybe post-flight tasks get merged in a later round of improvements here. I'm hoping better build system dependencies make this a non-issue in the near future.

Dustin J. Mitchell [:dustin] (he/him)

Comment 17

•

11 years ago

** one pony, with rainbow hair Let's get the basic framework in place before we start hammering every nail!

Ed Morley [:emorley]

Updated

•

11 years ago

Depends on: 932877

Ed Morley [:emorley]

Comment 18

•

11 years ago

(In reply to Chris Cooper [:coop] from comment #16) > * measuring improvement: we track current setup and teardown times per OS in > brasstacks (http://brasstacks.mozilla.com/gofaster/#/ but the charts seem to > be busted right now, I'll investigate). Filed bug 932877.

Chris Cooper [:coop] (he/him)

Updated

•

11 years ago

Depends on: 769384

Chris Cooper [:coop] (he/him)

Updated

•

11 years ago

Blocks: 827367

Simone Bruno [:simone]

Comment 20

•

11 years ago

I am going to write a preflight_runner.py script, which will be invoked just before running runslave.py in the following files, which reside in http://hg.mozilla.org/build/puppet/file/default/modules/buildslave/files: darwin-run-buildslave.sh run-puppet-and-buildbot.sh start-buildbot-win64.bat startTalos-w7.bat startTalos-w764.bat startTalos-w864.bat startTalos-xp.bat Using these files as hook points for preflight calls will make it possible to roll out preflight scripts per OS, as proposed by coop. preflight_runner.py will run a directory containing scripts, determining the order of execution based on some simple naming conventions. I will start implementing support for python scripts, but it should not be difficult to extend to direct support of native shell scripts (.sh scripts for unix, .bat/.cmd scripts for Windows). Armen, can you please confirm that this plan is compatible with our last conversation?

Flags: needinfo?(armenzg)

Armen [:armenzg]

Comment 21

•

11 years ago

First of all, could you please go to each Windows host and grab a recent version of start*.bat? What is committed puppet could be easily be out-of-date since we have refactor the Windows infrastructure with GPO and have changed a bunch of assumptions. This matches what we spoke. I have no problem making the script more OS-centric, however, I would *suggest* not mixing them in the same directory but split by OS (e.g. win_preflight_tasks/, unix_preflight_tasks/ and osx_preflight_tasks/).

Flags: needinfo?(armenzg)

Simone Bruno [:simone]

Comment 22

•

11 years ago

Hi Armen, Thanks for your quick feedback. I will have a look to the actual .bat files on windows hosts - even if there are differences, I believe the principle of using these files as hook points before runslave.py is called remains valid. Regarding the preflight tasks, I completely agree with your recommendation; they will necessarily be written and stored separately for each OS, at least for two reasons: 1) because we may want to add different behavior per OS 2) because we may want to support native shell scripts

Dustin J. Mitchell [:dustin] (he/him)

Comment 23

•

11 years ago

The scripts will, presumably, be installed by puppet, so I would recommend having the per-OS (or per-silo, really) logic handled in puppet, not by the runner script.

Chris AtLee [:catlee]

Reporter

Comment 24

•

11 years ago

I'd like to shift focus slightly for this. Instead of running pre-flight tasks on boot, we should run them before running buildbot. A minor distinction right now, but I'd like to get to the point where we're running some kind of loop like: while True: - run pre-flight tasks - run buildbot for one job - run post-flight tasks then we can stop rebooting machines after every job.

Armen [:armenzg]

Updated

•

11 years ago

Depends on: 1005936

Chris AtLee [:catlee]

Reporter

Comment 25

•

11 years ago

We're going to do this with runner: https://github.com/catlee/runner

Assignee: sbruno → catlee

Chris AtLee [:catlee]

Reporter

Updated

•

11 years ago

Blocks: 1019013

Ian Connolly [:ianconnolly]

Comment 27

•

11 years ago

Attached patch runner.diff (obsolete) — Details — Splinter Review

Puppet module for installing and configuring Runner. Runner itself is already packaged as an sdist and uploaded. Unsure if :catlee wants a review on runner itself or not.

Attachment #8436062 - Flags: review?(dustin)

Attachment #8436062 - Flags: feedback?(catlee)

Dustin J. Mitchell [:dustin] (he/him)

Comment 28

•

11 years ago

Comment on attachment 8436062 [details] [diff] [review] runner.diff Review of attachment 8436062 [details] [diff] [review]: ----------------------------------------------------------------- It looks like this has a bunch of other stuff mixed in. Can you re-post a diff containing only your changes?

Attachment #8436062 - Flags: review?(dustin)

Ian Connolly [:ianconnolly]

Comment 29

•

11 years ago

Attached patch Puppet module for Catlee's Runner (obsolete) — Details — Splinter Review

A diff not created on an brain-addled Friday evening...

Attachment #8436062 - Attachment is obsolete: true

Attachment #8436062 - Flags: feedback?(catlee)

Attachment #8437030 - Flags: review?(dustin)

Attachment #8437030 - Flags: feedback?(catlee)

Dustin J. Mitchell [:dustin] (he/him)

Comment 30

•

11 years ago

Comment on attachment 8437030 [details] [diff] [review] Puppet module for Catlee's Runner Review of attachment 8437030 [details] [diff] [review]: ----------------------------------------------------------------- I think we talked a bit in irc about this, but just to follow up, it seems like dependencies are going to be the gotcha here, especially if this is going to be useful for other organizations (e.g., QA probably has some pre-flight tasks, but they're probably not very similar to releng's). Using sorted, numbered scripts works OK if different numbers have meaning (e.g., "all repositories are up to date by level 10"), but they don't represent dependencies very well. For example, if I rename 15-frobnicate.sh to 17-frobnicate.sh, how do I know that 16-barterize.sh didn't depend on it? Since the runner itself is Python, it'd actually be pretty easy to specify dependencies explicitly, either in the scripts themselves (perhaps in comments) or in some nearby file (e.g., barterize.deps), and then flatten that dependency tree into a fixed order in the Python script. That would also leave the door open to parallelizing the preflight tasks at a later date. And if those dependencies are reflected in the Puppet configuration, then Puppet can ensure that every task's dependency tasks are also installed. Systemd, launchd, and Gentoo all solve this a little bit differently, and they all have complicated syntax to say 'starts before this service *or* that service' or to refer to virtual services or whatever. I don't see runner needing such levels of complexity. This, combined with the config handling I suggest below, would mean that toplevel::slave::build (or toplevel::slave::qa::tps_ci) only needs to have include runner::task::update_shared_repos include runner::task::clean _tempfiles and all of the reqiured configuration, dependent tasks, and runner itself would be installed automatically. ::: modules/runner/files/checkout_tools @@ +6,5 @@ > + exit $? > +fi > +HGTOOL=/usr/local/bin/hgtool.py > + > +TOOLS_REPO=$($RUNNER_CONFIG_CMD -g hg.tools_repo) I don't see $RUNNER_CONFIG_CMD defined anywhere.. ::: modules/runner/manifests/service.pp @@ +6,5 @@ > + include runner::settings > + case $::operatingsystem { > + 'CentOS': { > + # Which service this relies on > + $initd_required_start = 'network' What's this for? Would this change? ::: modules/toplevel/manifests/slave/build.pp @@ -33,5 @@ > ccache::ccache_dir { > - "/builds/ccache": > - maxsize => "10G", > - owner => $users::builder::username, > - group => $users::builder::group; Looks like you have some kind of automatic puppetlint thing going on? It's OK for files you've modified, but not every file.. ::: modules/toplevel/manifests/slave/build/mock.pp @@ -17,5 @@ > # good way to communicate the need to that class. > exec { > 'add-builder-to-mock_mozilla': > - command => "/usr/bin/gpasswd -a $users::builder::username mock_mozilla", > - unless => "/usr/bin/groups $users::builder::username | grep '\\<mock_mozilla\\>'", The addition of {..} here is legit, but we don't follow the style of lining up the => in a single column. @@ +32,5 @@ > + data => template('toplevel/slave/build/hg.cfg.erb'); > + 'buildbot.cfg': > + sectionname => 'buildbot', > + data => template('toplevel/slave/build/buildbot.cfg.erb'); > + } I think there are bits of the config stuff missing in this patch, so maybe I'm missing something, but -- why have all of these config files? And it looks like each just has a single section? You could use the 'concat' puppet module to concatenate those into a single file and make things easier for users to read in place. From what I can tell, the config's are orthogonal to the tasks - -meaning that a task may use several config's, and several tasks may use a config. What if each task included the configs it needed? So runner::tasks::checkout_tools would have include runner::config::hg and runner::tasks::update_shared_repos would have include runner::config::hg include runner::config::env

Attachment #8437030 - Flags: review?(dustin) → feedback+

Ian Connolly [:ianconnolly]

Comment 31

•

11 years ago

(In reply to Dustin J. Mitchell [:dustin] from comment #30) > Comment on attachment 8437030 [details] [diff] [review] > Puppet module for Catlee's Runner > > Review of attachment 8437030 [details] [diff] [review]: > ----------------------------------------------------------------- Thanks a bunch for taking the time to review this so thoroughly, Dustin! > I think we talked a bit in irc about this, but just to follow up, it seems > like dependencies are going to be the gotcha here, especially if this is > going to be useful for other organizations (e.g., QA probably has some > pre-flight tasks, but they're probably not very similar to releng's). > > Using sorted, numbered scripts works OK if different numbers have meaning > (e.g., "all repositories are up to date by level 10"), but they don't > represent dependencies very well. For example, if I rename 15-frobnicate.sh > to 17-frobnicate.sh, how do I know that 16-barterize.sh didn't depend on it? > > Since the runner itself is Python, it'd actually be pretty easy to specify > dependencies explicitly, either in the scripts themselves (perhaps in > comments) or in some nearby file (e.g., barterize.deps), and then flatten > that dependency tree into a fixed order in the Python script. That would > also leave the door open to parallelizing the preflight tasks at a later > date. And if those dependencies are reflected in the Puppet configuration, > then Puppet can ensure that every task's dependency tasks are also installed. > > Systemd, launchd, and Gentoo all solve this a little bit differently, and > they all have complicated syntax to say 'starts before this service *or* > that service' or to refer to virtual services or whatever. I don't see > runner needing such levels of complexity. > > This, combined with the config handling I suggest below, would mean that > toplevel::slave::build (or toplevel::slave::qa::tps_ci) only needs to have > > include runner::task::update_shared_repos > include runner::task::clean _tempfiles > > and all of the reqiured configuration, dependent tasks, and runner itself > would be installed automatically. I need to grab :catlee to discuss this today. Will get back to you with my thoughts after that. > > ::: modules/runner/files/checkout_tools > @@ +6,5 @@ > > + exit $? > > +fi > > +HGTOOL=/usr/local/bin/hgtool.py > > + > > +TOOLS_REPO=$($RUNNER_CONFIG_CMD -g hg.tools_repo) > > I don't see $RUNNER_CONFIG_CMD defined anywhere.. Runner sets this itself (and guarantees it being set). Definitely worthy of a comment though. Will fix. > > ::: modules/runner/manifests/service.pp > @@ +6,5 @@ > > + include runner::settings > > + case $::operatingsystem { > > + 'CentOS': { > > + # Which service this relies on > > + $initd_required_start = 'network' > > What's this for? Would this change? Good question. Will find out if still needed > ::: modules/toplevel/manifests/slave/build.pp > @@ -33,5 @@ > > ccache::ccache_dir { > > - "/builds/ccache": > > - maxsize => "10G", > > - owner => $users::builder::username, > > - group => $users::builder::group; > > Looks like you have some kind of automatic puppetlint thing going on? It's > OK for files you've modified, but not every file.. Oh, damn. Apologies. > > ::: modules/toplevel/manifests/slave/build/mock.pp > @@ -17,5 @@ > > # good way to communicate the need to that class. > > exec { > > 'add-builder-to-mock_mozilla': > > - command => "/usr/bin/gpasswd -a $users::builder::username mock_mozilla", > > - unless => "/usr/bin/groups $users::builder::username | grep '\\<mock_mozilla\\>'", > > The addition of {..} here is legit, but we don't follow the style of lining > up the => in a single column. Will know for the future - thanks! > > @@ +32,5 @@ > > + data => template('toplevel/slave/build/hg.cfg.erb'); > > + 'buildbot.cfg': > > + sectionname => 'buildbot', > > + data => template('toplevel/slave/build/buildbot.cfg.erb'); > > + } > > I think there are bits of the config stuff missing in this patch, so maybe > I'm missing something, but -- why have all of these config files? And it > looks like each just has a single section? You could use the 'concat' > puppet module to concatenate those into a single file and make things easier > for users to read in place. > > From what I can tell, the config's are orthogonal to the tasks - -meaning > that a task may use several config's, and several tasks may use a config. > What if each task included the configs it needed? > > So runner::tasks::checkout_tools would have > include runner::config::hg > and runner::tasks::update_shared_repos would have > include runner::config::hg > include runner::config::env There doesn't seem to be any config missing. Is there something in particularly that made you think that that I may have forgotten? And that's a good point on being able to `include` the configs. I'll look into that today. Thanks again.

Dustin J. Mitchell [:dustin] (he/him)

Comment 32

•

11 years ago

RUNNER_CONFIG_CMD made me think something was missing, but I see the error of my ways now.

Chris AtLee [:catlee]

Reporter

Updated

•

11 years ago

Assignee: catlee → iconnolly

Chris AtLee [:catlee]

Reporter

Comment 33

•

11 years ago

Comment on attachment 8437030 [details] [diff] [review] Puppet module for Catlee's Runner Review of attachment 8437030 [details] [diff] [review]: ----------------------------------------------------------------- We also spoke on IRC and vidyo about this. I like the idea of moving the configs into their own modules, related to which tasks are defined. Let's leave inter-task dependencies for the next phase.

Attachment #8437030 - Flags: feedback?(catlee) → feedback+

Ian Connolly [:ianconnolly]

Comment 34

•

11 years ago

Attached patch post-feedback diff (obsolete) — Details — Splinter Review

Updated runner module post-feedback with Dustin. Just need a quick look at the config changes to make sure what I envisaged was aligned with what :dustin was thinking. Left inter-task dependencies to iteration n+1.

Attachment #8437030 - Attachment is obsolete: true

Attachment #8439356 - Flags: review?(dustin)

Chris Cooper [:coop] (he/him)

Updated

•

11 years ago

Blocks: 1005936, 930831, 769384, 712205, 596748, 932877, 1018802

No longer depends on: 596748, 712205, 769384, 930831, 1005936, 932877

Chris Cooper [:coop] (he/him)

Updated

•

11 years ago

Blocks: 699195

Dustin J. Mitchell [:dustin] (he/him)

Comment 36

•

11 years ago

Comment on attachment 8439356 [details] [diff] [review] post-feedback diff Review of attachment 8439356 [details] [diff] [review]: ----------------------------------------------------------------- All pretty minor comments here -- nice work! ::: modules/runner/files/checkout_tools @@ +1,1 @@ > +#!/bin/bash Minor, but these files should have license headers on them @@ +7,5 @@ > +fi > +HGTOOL=/usr/local/bin/hgtool.py > + > +# $RUNNER_CONFIG_CMD is guaranteed to be set by Runner itself. See Runner > +# documentation for more information. Thanks :) ::: modules/runner/manifests/config.pp @@ +1,5 @@ > +# This Source Code Form is subject to the terms of the Mozilla Public > +# License, v. 2.0. If a copy of the MPL was not distributed with this > +# file, You can obtain one at http://mozilla.org/MPL/2.0/. > +# config file for runner > +define runner::config($sectionname, $data) { This appears unused now - remove? ::: modules/runner/manifests/service.pp @@ +6,5 @@ > + include runner::settings > + case $::operatingsystem { > + 'CentOS': { > + # Which service this relies on > + $initd_required_start = 'network' This is still hanging around.. ::: modules/runner/manifests/tasks/checkout_tools.pp @@ +3,5 @@ > + file { > + '/tools/checkouts': > + ensure => directory, > + owner => cltbld, > + group => cltbld; replace cltbld with $::config::builder_username @@ +4,5 @@ > + '/tools/checkouts': > + ensure => directory, > + owner => cltbld, > + group => cltbld; > + } one space too many ::: modules/toplevel/manifests/slave/build/mock.pp @@ +22,4 @@ > require => [Class['packages::mozilla::mock_mozilla'], Class['users::builder']]; > } > + > + include runner If you make each of the runner::tasks::* include runner, then you don't even need this here. That might be a bit easier.

Attachment #8439356 - Flags: review?(dustin) → review+

Ian Connolly [:ianconnolly]

Comment 37

•

11 years ago

(In reply to Dustin J. Mitchell [:dustin] from comment #36) > Comment on attachment 8439356 [details] [diff] [review] > post-feedback diff > > Review of attachment 8439356 [details] [diff] [review]: > ----------------------------------------------------------------- > > All pretty minor comments here -- nice work! Thanks! > > ::: modules/runner/files/checkout_tools > @@ +1,1 @@ > > +#!/bin/bash > > Minor, but these files should have license headers on them > Fixed! > > ::: modules/runner/manifests/config.pp > @@ +1,5 @@ > > +# This Source Code Form is subject to the terms of the Mozilla Public > > +# License, v. 2.0. If a copy of the MPL was not distributed with this > > +# file, You can obtain one at http://mozilla.org/MPL/2.0/. > > +# config file for runner > > +define runner::config($sectionname, $data) { > > This appears unused now - remove? Done. > > ::: modules/runner/manifests/service.pp > @@ +6,5 @@ > > + include runner::settings > > + case $::operatingsystem { > > + 'CentOS': { > > + # Which service this relies on > > + $initd_required_start = 'network' > > This is still hanging around.. Oops! My bad, I had already removed it but I must have generated the diff beforehand. Fixed. > > ::: modules/runner/manifests/tasks/checkout_tools.pp > @@ +3,5 @@ > > + file { > > + '/tools/checkouts': > > + ensure => directory, > > + owner => cltbld, > > + group => cltbld; > > replace cltbld with $::config::builder_username Fixed. > > @@ +4,5 @@ > > + '/tools/checkouts': > > + ensure => directory, > > + owner => cltbld, > > + group => cltbld; > > + } > > one space too many Annoyed the linter didn't catch that. Thanks. > > ::: modules/toplevel/manifests/slave/build/mock.pp > @@ +22,4 @@ > > require => [Class['packages::mozilla::mock_mozilla'], Class['users::builder']]; > > } > > + > > + include runner > > If you make each of the runner::tasks::* include runner, then you don't even > need this here. That might be a bit easier. Good idea. Done.

Ian Connolly [:ianconnolly]

Comment 38

•

11 years ago

Attached patch runner.diff -- Changes made as per Dustin's 2nd review (obsolete) — Details — Splinter Review

Could I get a check-in? :) Carrying the r+ from :dustin over from Attachment 8439356 [details] [diff].

Attachment #8439356 - Attachment is obsolete: true

Attachment #8441600 - Flags: review+

Attachment #8441600 - Flags: checked-in?

Rail Aliiev [:rail]

Comment 39

•

11 years ago

Comment on attachment 8441600 [details] [diff] [review] runner.diff -- Changes made as per Dustin's 2nd review remote: https://hg.mozilla.org/build/puppet/rev/58a3c0e9ae64 remote: https://hg.mozilla.org/build/puppet/rev/6fa2dc80d46a

Attachment #8441600 - Flags: checked-in? → checked-in+

Chris AtLee [:catlee]

Reporter

Comment 40

•

11 years ago

Comment on attachment 8441600 [details] [diff] [review] runner.diff -- Changes made as per Dustin's 2nd review we didn't end up with the proper boot order for some reason

Attachment #8441600 - Flags: checked-in+ → checked-in-

Chris Cooper [:coop] (he/him)

Updated

•

11 years ago

Blocks: 785410

Chris AtLee [:catlee]

Reporter

Updated

•

11 years ago

Blocks: 1028191

Ian Connolly [:ianconnolly]

Comment 45

•

11 years ago

Attached patch Runner - with boot order fixes (obsolete) — Details — Splinter Review

This should fix the boot order problems we were having previously. The problems were: a) We had rc3.d/ symlinks for puppet and buildbot already on ours existing machines. We were changing their boot priority in their init.d/ scripts, but the symlinks weren't being updated to reflect that. Puppet doesn't do this as part of its Service type unfortunately, but I've run an external command to reset these symlinks. (This was all working fine on *new* machines because the symlinks had the correct priority value if they were newly created) b) Another issue was that we were moving new services to after puppet, that weren't currently in the boot order for after puppet (ie. we were changing puppet's boot priority while it was running from its initscript). This has been fixed by getting puppet for force a reboot whenever its initscript is changed. These were the known issues that arose from the last deployment. Upon a reboot of a server configured with this patch we correctly see puppet -> runner -> buildbot so hopefully that means we're good.

Attachment #8441600 - Attachment is obsolete: true

Attachment #8443701 - Flags: review?(dustin)

Attachment #8443701 - Flags: checked-in?

Rail Aliiev [:rail]

Comment 46

•

11 years ago

Comment on attachment 8443701 [details] [diff] [review] Runner - with boot order fixes Review of attachment 8443701 [details] [diff] [review]: ----------------------------------------------------------------- LGTM. Just a couple of trailing space nazi comments below. :) I can fix them when I land the patch. ::: manifests/moco-config.pp @@ +225,4 @@ > $selfserve_agent_carrot_exchange = "buildapi.control" > $selfserve_agent_carrot_queue = "buildapi-agent-rabbit2" > > + Can you remove the added empty line above? ::: modules/puppet/manifests/atboot.pp @@ +27,5 @@ > exec { > # ask the puppet startup script to reboot > "reboot-after-puppet": > + command => "touch /REBOOT_AFTER_PUPPET", > + path => ['/bin/', '/usr/bin/'], From IRC, "touch" lives in /bin on CentOS. Good catch! @@ +61,5 @@ > + # rc3.d symlink and we want to reboot because we've changed > + # what we want to come after puppet in the boot order > + notify => [ Exec['initd-refresh'], Exec['reboot-after-puppet'] ]; > + } > + .. and here @@ +65,5 @@ > + > + exec { > + 'initd-refresh': > + # resetpriorities tells chkconfig to update the > + # symlinks in rcX.d with the values from the service's and on these 2 lines @@ +68,5 @@ > + # resetpriorities tells chkconfig to update the > + # symlinks in rcX.d with the values from the service's > + # init.d script > + command => '/sbin/chkconfig puppet resetpriorities', > + refreshonly => true; And here (trailing spaces nazi mode off) ::: modules/puppet/templates/puppet-centos-initd.erb @@ +8,5 @@ > +### BEGIN INIT INFO > +# Provides: puppet > +# Required-Start: $local_fs $network > +# Required-Start: $local_fs $network > +# Should-Start: $remote_fs Ooop, and here

Attachment #8443701 - Flags: review?(dustin) → review+

Ian Connolly [:ianconnolly]

Comment 47

•

11 years ago

Attached patch cleaned runner.diff — Details — Splinter Review

Whitespace fixes from Rail's review. Carrying over the r+.

Attachment #8443701 - Attachment is obsolete: true

Attachment #8443701 - Flags: checked-in?

Attachment #8445296 - Flags: review+

Ian Connolly [:ianconnolly]

Updated

•

11 years ago

Attachment #8445296 - Flags: checked-in?

Chris AtLee [:catlee]

Reporter

Updated

•

11 years ago

Attachment #8445296 - Flags: checked-in? → checked-in+

Ian Connolly [:ianconnolly]

Comment 48

•

11 years ago

Attached patch fix for subset of machines that got the backed out patch — Details — Splinter Review

Attachment #8445296 - Attachment is obsolete: true

Ian Connolly [:ianconnolly]

Comment 49

•

11 years ago

We're sticking with this - will pull fix out into new bug.

Chris AtLee [:catlee]

Reporter

Updated

•

11 years ago

Attachment #8445296 - Attachment is obsolete: false

Ian Connolly [:ianconnolly]

Updated

•

11 years ago

Depends on: 1029704

Ian Connolly [:ianconnolly]

Updated

•

11 years ago

Depends on: 1029777

Chris AtLee [:catlee]

Reporter

Updated

•

11 years ago

Depends on: 1029903

Chris AtLee [:catlee]

Reporter

Updated

•

11 years ago

Blocks: 851123

Chris Cooper [:coop] (he/him)

Updated

•

11 years ago

Blocks: 1036468

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

10 years ago

Summary: slave pre-flight tasks → slave pre-flight tasks (runner)

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

10 years ago

Depends on: 1055806

Chris AtLee [:catlee]

Reporter

Updated

•

10 years ago

Depends on: 1069529

Chris AtLee [:catlee]

Reporter

Updated

•

10 years ago

Depends on: 1069530

Updated

•

10 years ago

Whiteboard: [unittest][talos][automation][q4goal] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2706] [unittest][talos][automation][q4goal]

Updated

•

10 years ago

Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2706] [unittest][talos][automation][q4goal] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2714] [unittest][talos][automation][q4goal]

Morgan Phillips [:mrrrgn]

Assignee

Updated

•

10 years ago

Assignee: ian → winter2718

Morgan Phillips [:mrrrgn]

Assignee

Comment 50

•

10 years ago

It seems like this is essentially finished. I'm starting on getting rid of post-build reboots; will open bugs for any additional runner tasks that need creating catch-as-catch-can.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

7 years ago

Component: Platform Support → Buildduty

Product: Release Engineering → Infrastructure & Operations

Updated

•

5 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

You need to log in before you can comment on or make changes to this bug.