712205 - slaves should always have tools checked out and up to date

Reporter

Description

•

13 years ago

most of our jobs require hg.m.o/build/tools in some way. we should stop clobbering this and checking it out for every build / test run and instead make sure it's checked out and up to date as part of machine startup.

Chris AtLee [:catlee]

Reporter

Updated

•

13 years ago

Blocks: 712206

Armen [:armenzg]

Updated

•

13 years ago

Whiteboard: [unittest][talos]

Chris Cooper [:coop] (he/him)

Updated

•

13 years ago

Priority: -- → P4

Whiteboard: [unittest][talos] → [unittest][talos][automation]

Chris Cooper [:coop] (he/him)

Comment 6

•

13 years ago

Man, I could really use this for bug 660480. A quick survey of our talos platforms indicates that some platforms have a longstanding tools checkout while some do not: fed & fed64 /tools/build-tools leopard /tools/build-tools (NOT UP-TO-DATE) xp C:\tools r4-lion, r4-snow, & w7 none Based on the above, I would vote to standardize on /tools/build-tools and C:\tools\build-tools as locations for our permanent checkouts when we fix this.

Chris Cooper [:coop] (he/him)

Updated

•

13 years ago

Blocks: 660480

Armen [:armenzg]

Comment 7

•

13 years ago

If we had this (on a tangent), we could also move runslave.py unto tools and always have an up-to-date version of it. This would be beneficial for Windows since it would be located on a non-admin directory, hence, not requiring an UAC prompt.

Chris Cooper [:coop] (he/him)

Updated

•

13 years ago

Component: Release Engineering → Release Engineering: Automation

QA Contact: release → catlee

Chris AtLee [:catlee]

Reporter

Updated

•

13 years ago

Component: Release Engineering: Automation → Release Engineering: Machine Management

QA Contact: catlee → armenzg

Justin Wood (:Callek)

Comment 10

•

13 years ago

taking per joduinn

Assignee: nobody → bugspam.Callek

Priority: P4 → P1

Justin Wood (:Callek)

Comment 11

•

13 years ago

If I will be expected to test this myself, I'll need access to a puppet master, one of each type of system, and opsi. As well as a crash-course in how to stage stuff with puppet, and how to actually *use* opsi, rather than just write manifests for it. Though this should be simple enough in terms of puppet/opsi-manifest that I can probably do the work with minimal testing by someone else. (If I need to test myself will get bugs on file for the access, otherwise will just chug away at this anyway)

Armen [:armenzg]

Comment 12

•

13 years ago

To be honest, I don't why we want to use puppet/opsi for this. I would prefer not to clobber tools, to share it amongst builders and keep it up to date. If we can make the step smart enough to retry on HG being down and then retry the job after few attempts that would *my* ideal. I saw this because we would eventually move to mozharness and to newer slaves and I don't think it would be good to carry over puppet and opsi dependencies. On another note, we don't keep a copy of mozilla-central checked out with puppet/opsi and I don't know why the tools repo should be different. What do you guys think?

Chris AtLee [:catlee]

Reporter

Comment 13

•

13 years ago

I think tools is fundamentally different that mozilla-central. We rely on tools for all sorts of things, it's critical to the 'setup' of many types of builds. We shouldn't be clobbering/checking out tools for every type of job we run. It should be always available up-to-date on the system. I don't think that puppet/opsi necessarily have to do this. It probably belongs more as a 'pre-flight' type of task (see bug 712206).

Justin Wood (:Callek)

Comment 14

•

13 years ago

Attached patch [puppet] WIP 0 (obsolete) — Details — Splinter Review

*Very* rudimentary start to this. The idea is that we'll have a preflight initd script which is itself responsible for launching buildbot if preflight tasks succeed. catlee suggested run-parts, which this patch is beginning to prepare for. I'll just have to figure out if its possible to have run-parts exit with error if one of its sub-scripts exits with an error. If not figure out a different way to do this. None of this is tested, but I wanted to dump it public before I head out for the day. Since we can't rely on anything in tools/ during the clone/update of itself, I'm writing a basic retry logic and clone/checkout in this script. There will need to be changes to make buildbot recognize and use the tools dir(s) across slave types of course.

Chris AtLee [:catlee]

Reporter

Updated

•

13 years ago

Blocks: 737186

Chris AtLee [:catlee]

Reporter

Comment 15

•

13 years ago

Comment on attachment 608457 [details] [diff] [review] [puppet] WIP 0 Looks like a good start. Will need to pull/update the directory if it exists. It really really sucks that we have to replicate hgtool in bash here...I wonder if it's worthwhile packaging hgtool and its dependencies as a standalone package that puppet can deploy and use.

Chris Cooper [:coop] (he/him)

Comment 16

•

13 years ago

(In reply to Chris AtLee [:catlee] from comment #15) > It really really sucks that we have to replicate hgtool in bash here...I > wonder if it's worthwhile packaging hgtool and its dependencies as a > standalone package that puppet can deploy and use. Could we deploy a bundle with puppet and then just update it on every reboot?

Justin Wood (:Callek)

Comment 17

•

13 years ago

(In reply to Chris AtLee [:catlee] from comment #15) > Comment on attachment 608457 [details] [diff] [review] > [puppet] WIP 0 > > Looks like a good start. Will need to pull/update the directory if it exists. > > It really really sucks that we have to replicate hgtool in bash here...I > wonder if it's worthwhile packaging hgtool and its dependencies as a > standalone package that puppet can deploy and use. I would love to, but part of our problems with deploying hgtool seperately, is that its currently not really splittable from our tools repo magic (right now) lines 11-13 of http://mxr.mozilla.org/build/source/tools/buildfarm/utils/hgtool.py is why (In reply to Chris Cooper [:coop] from comment #16) > (In reply to Chris AtLee [:catlee] from comment #15) > > It really really sucks that we have to replicate hgtool in bash here...I > > wonder if it's worthwhile packaging hgtool and its dependencies as a > > standalone package that puppet can deploy and use. > > Could we deploy a bundle with puppet and then just update it on every reboot? I know we can probably deploy an initial hg bundle of tools, and likely update from that. But keeping the bundle itself up to date is going to be a pain with our current setups. I think I'll investigate for sure that updating tools from a tools repo works, and if so attack that solution.

Justin Wood (:Callek)

Comment 18

•

13 years ago

Attached patch [puppet-manifests] WIP 1 — Details — Splinter Review

So far untested, but early-feedback welcome

Attachment #608457 - Attachment is obsolete: true

Attachment #611075 - Flags: feedback?(coop)

Attachment #611075 - Flags: feedback?(catlee)

Chris Cooper [:coop] (he/him)

Comment 19

•

13 years ago

Comment on attachment 611075 [details] [diff] [review] [puppet-manifests] WIP 1 Review of attachment 611075 [details] [diff] [review]: ----------------------------------------------------------------- Some nits, but looking good for Linux so far. ::: modules/buildslave/files/preflights/10-toolsdir @@ +21,5 @@ > + echo "[paths]" > ${TOOLS_DIR}/.hg/hgrc > + echo "default = ${REPO_URL}" >> ${TOOLS_DIR}/.hg/hgrc > +fi > + > +set +e # Turn off errors so we continue if hgtool fails What's the non-fatal failure mode that this is meant to avoid? ::: modules/buildslave/manifests/preflights.pp @@ +5,5 @@ > +# include buildslave::preflights > +# > +# This class takes care of installing runslave.py and any other prerequisites, > +# and hooking into the startup process as appropriate to the system. The > +# interface between this class and the puppet startup is somewhat ill-defined, "ill-defined" how? Are we talking about when in the sequence this module gets run? ::: modules/buildslave/manifests/startup.pp @@ +42,5 @@ > group => "root", > + mode => 755, > + notify => Exec["reset-preflight-service"], > + require => File["/etc/preflight/preflight.d"]; > + #"/etc/init.d/buildbot": I would just remove this section rather than comment it out.

Attachment #611075 - Flags: feedback?(coop) → feedback+

Chris AtLee [:catlee]

Reporter

Comment 20

•

13 years ago

Comment on attachment 611075 [details] [diff] [review] [puppet-manifests] WIP 1 Review of attachment 611075 [details] [diff] [review]: ----------------------------------------------------------------- I like the overall direction here. Some small things to clean up, but nothing major. ::: modules/buildslave/files/linux-initd-buildbot.sh @@ +1,2 @@ > #! /bin/bash > +# initscript for preflight checks I can't tell from the diff if you've renamed this file or modified the original...hopefully you're creating a new one! @@ +31,5 @@ > +start_preflights() { > + # Explicitly allow us to continue when something fails > + set +e > + > + echo > $LOGFILE # Empty logfile on boot to keep it from growing too much we may want to use some log rotation here instead. wiping it on boot could make future debugging hard. @@ +35,5 @@ > + echo > $LOGFILE # Empty logfile on boot to keep it from growing too much > + > + # Unable to use run-parts because it does not exit on error > + # Ignore *~ and *, scripts > + for i in ${PREFLIGHT_DIR}/*[^~,] ; do will this sort properly? will 2-foo be before or after 11-bar? @@ +36,5 @@ > + > + # Unable to use run-parts because it does not exit on error > + # Ignore *~ and *, scripts > + for i in ${PREFLIGHT_DIR}/*[^~,] ; do > + [ -d $i ] && continue what does this do? skip directories? comments FTW! @@ +39,5 @@ > + for i in ${PREFLIGHT_DIR}/*[^~,] ; do > + [ -d $i ] && continue > + > + if [ -x $i ]; then > + print "====== Starting Preflight: '$i' ======" \ itym echo? ::: modules/buildslave/files/preflights/10-toolsdir @@ +8,5 @@ > +PYTHON=python > + > +if [ ! -d ${TOOLS_DIR}/.hg ] > +then > + # Something is strange, clobber it some logging here would be good! @@ +21,5 @@ > + echo "[paths]" > ${TOOLS_DIR}/.hg/hgrc > + echo "default = ${REPO_URL}" >> ${TOOLS_DIR}/.hg/hgrc > +fi > + > +set +e # Turn off errors so we continue if hgtool fails you can also wrap this in if ( ! ${PYTHON} ... ); then # something went wrong... fi commands that exit with non-zero inside () don't cause the script to exit when -e is set ::: modules/buildslave/manifests/startup.pp @@ +60,1 @@ > command => "/sbin/chkconfig --del buildbot && /sbin/chkconfig --add buildbot", should this be resetting the preflight service instead?

Attachment #611075 - Flags: feedback?(catlee) → feedback+

Justin Wood (:Callek)

Comment 21

•

13 years ago

(In reply to Chris Cooper [:coop] from comment #19) > ::: modules/buildslave/files/preflights/10-toolsdir > @@ +21,5 @@ > > + echo "[paths]" > ${TOOLS_DIR}/.hg/hgrc > > + echo "default = ${REPO_URL}" >> ${TOOLS_DIR}/.hg/hgrc > > +fi > > + > > +set +e # Turn off errors so we continue if hgtool fails > > What's the non-fatal failure mode that this is meant to avoid? Not so much non-fatal (since we'll still treat it fatal) but we want to properly cleanup after the failed attempt here. (the rm following that) so that if automation then reboots (kittenherder?) it can try again. (In reply to Chris AtLee [:catlee] from comment #20) > ::: modules/buildslave/files/linux-initd-buildbot.sh > @@ +1,2 @@ > > #! /bin/bash > > +# initscript for preflight checks > > I can't tell from the diff if you've renamed this file or modified the > original...hopefully you're creating a new one! mentioned in IRC, but this is because of hg's and bmo's failure to communicate, I did an hg rename/copy for some of this stuff. BMO only shows old filename > @@ +35,5 @@ > > + echo > $LOGFILE # Empty logfile on boot to keep it from growing too much > > + > > + # Unable to use run-parts because it does not exit on error > > + # Ignore *~ and *, scripts > > + for i in ${PREFLIGHT_DIR}/*[^~,] ; do > > will this sort properly? will 2-foo be before or after 11-bar? It does sort, but by the same method ls * will sort, which means 2-foo will be after 11-bar but before 2-zoo and 3-foo.

Justin Wood (:Callek)

Updated

•

13 years ago

Depends on: 743316

Chris AtLee [:catlee]

Reporter

Updated

•

13 years ago

Blocks: 750427

Armen [:armenzg]

Comment 22

•

13 years ago

I filed bug 764102 to remove the dependency to the tools repo for *test* slaves.

Chris AtLee [:catlee]

Reporter

Updated

•

13 years ago

Blocks: 758216

Chris Cooper [:coop] (he/him)

Updated

•

12 years ago

Component: Release Engineering: Machine Management → Release Engineering: Platform Support

QA Contact: armenzg → coop

Chris Cooper [:coop] (he/him)

Comment 23

•

12 years ago

Callek: if you don't have time to work on this, please consider offloading it to Pete or Simone. Fixing this bug will remove most of the need to fix bug 692715. Many of the failures in bug 692715 seem to occur while cleaning up tools prior to re-cloning.

Justin Wood (:Callek)

Comment 24

•

12 years ago

Simone, Pete, is this a bug one of you would want to drive to completion?

Flags: needinfo?(sbruno)

Flags: needinfo?(pmoore)

Chris Cooper [:coop] (he/him)

Comment 25

•

12 years ago

(In reply to Justin Wood (:Callek) from comment #24) > Simone, Pete, is this a bug one of you would want to drive to completion? In case it's not clear from the comments, this represents a simplification of our current process, saves us a non-zero amount of time per build/test, and will likely fix or neuter several other bugs in the process.

Pete Moore [:pmoore][:pete]

Comment 26

•

12 years ago

Sure

Flags: needinfo?(pmoore)

Nobody; OK to take it and work on it

Assignee

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

Armen [:armenzg]

Comment 27

•

11 years ago

Hi, I've got a note for bld-centos6-hp-002 which says that sbruno is using such host. Is this still valid? Fine if so.

Armen [:armenzg]

Updated

•

11 years ago

Blocks: b-linux64-hp-0021

Simone Bruno [:simone]

Comment 28

•

11 years ago

Hi Armen, yes I am using that host for some puppet tests.

Flags: needinfo?(sbruno)

Chris Cooper [:coop] (he/him)

Comment 29

•

11 years ago

Had a brief chat with Simone and Pete this morning about how this could work. Capturing that conversation here for posterity. Using puppet was brought up as a potential solution. The drawback here is that we would still need an alternate solution for Windows. Maybe this is an option if the follow-on work from bug 794612 goes well. I think the best place to integrate this change is in the pre-flight code that launches runslave.py. There are different scripts per platform here: https://hg.mozilla.org/build/puppet/file/618c178fd73e/modules/buildslave/files Cloning/updating tools would be another pre-flight check: no tools checkout, no buildbot startup. I had initially thought about adding the check directly to runslave.py, but I don't think that makes sense now. There are bound to be platform-specific foibles that are best contained in the existing platform-specific scripts (IMO).

Chris Cooper [:coop] (he/him)

Comment 30

•

11 years ago

Also, I should mention that the hard part here will be ripping out the existing tools checkout code from buildbotcustom and then verifying everything still works. Might be worthwhile maintaining the current workflow in buildbotcustom for release branches until we're sure this is working for nightly CI.

Simone Bruno [:simone]

Updated

•

11 years ago

Assignee: bugspam.Callek → sbruno

Simone Bruno [:simone]

Comment 31

•

11 years ago

Planning to work on this next week (16-20 Sept)

Chris Cooper [:coop] (he/him)

Updated

•

11 years ago

No longer blocks: 712206

Depends on: 712206

Chris Cooper [:coop] (he/him)

Updated

•

11 years ago

Assignee: sbruno → nobody

Chris AtLee [:catlee]

Reporter

Updated

•

11 years ago

Blocks: 1028198

bhearsum@mozilla.com (:bhearsum)

Comment 35

•

10 years ago

I think this is fixed by runner?

Flags: needinfo?(catlee)

Chris AtLee [:catlee]

Reporter

Comment 36

•

10 years ago

Yup. We need that deployed everywhere though.

Flags: needinfo?(catlee)

bhearsum@mozilla.com (:bhearsum)

Comment 37

•

10 years ago

Runner does this now!

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Assignee

Updated

•

7 years ago

Component: Platform Support → Buildduty

Product: Release Engineering → Infrastructure & Operations

BMO Automation

Updated

•

5 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

[puppet] WIP 0 13 years ago Justin Wood (:Callek) 2.21 KB, patch		Details \| Diff \| Splinter Review
[puppet-manifests] WIP 1 13 years ago Justin Wood (:Callek) 9.97 KB, patch	catlee : feedback+ coop : feedback+	Details \| Diff \| Splinter Review