Closed
Bug 1009584
Opened 8 years ago
Closed 8 years ago
Deploy new version of hg.m.o/build/buildbot to non-windows buildslaves to pick up bug 961075
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task, P2)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: sbruno)
References
Details
Attachments
(4 files)
1.76 KB,
patch
|
dustin
:
review-
|
Details | Diff | Splinter Review |
2.03 KB,
patch
|
Details | Diff | Splinter Review | |
1.76 KB,
patch
|
dustin
:
review+
sbruno
:
checked-in+
|
Details | Diff | Splinter Review |
1.59 KB,
patch
|
dustin
:
review+
sbruno
:
checked-in+
|
Details | Diff | Splinter Review |
In order to pick up the fix in bug 961075, please can: https://hg.mozilla.org/build/buildbot/rev/production-0.8 ...be deployed to the build slaves. I'm presuming we'll need to restart buildbot too, after the repo pull.
Comment 1•8 years ago
|
||
dustin, how do we deploy this? The change is on production-0.8. Is it a matter of? * tagging it with SLAVE_0_8_4_PRE_MOZ3 * update puppet's version to 0.8.4-pre-moz3 * test the puppet code
Component: Buildduty → Platform Support
Flags: needinfo?(dustin)
QA Contact: bugspam.Callek → coop
Comment 2•8 years ago
|
||
You'll need to copy an sdist tarball of the new version to the puppetagain pypi, too. Other than that, you've got it, noting that this will only deploy to OS X and Linux.
Comment 3•8 years ago
|
||
Assigning to coop to find resources. IIUC deploying this will improve starring for sheriffs. We should document this to help future deployments.
Assignee: nobody → coop
Flags: needinfo?(dustin)
Reporter | ||
Comment 4•8 years ago
|
||
Coop, I don't suppose you know when someone might have a chance to do this? :-)
Reporter | ||
Comment 6•8 years ago
|
||
Laura, I don't seem to be having any luck getting resources allocated to this bug (filed almost a month ago), would you mind seeing if there is someone who can take a look? Deploying this would help out the sheriffs, be ensuring the buildbot failure messages for timed out jobs are more useful.
Flags: needinfo?(laura)
Comment 7•8 years ago
|
||
(In reply to Ed Morley [:edmorley UTC+0] from comment #5) > Chris, any news on this? :-) Ed: while I appreciate your interest in getting this fixed, releng is *really* understaffed right now, and is struggling to meet existing commitments, much less take on new work at the end of the quarter. There is a non-trivial amount of work to happen here especially on Windows, and in all likelihood it will end up being a task for Q/markco to get a new buildbot GPO setup once some testing has been done. Can I ask whether there is any harm in getting the "easy" POSIX platforms done first, and then deploying Windows later, i.e. is it dangerous to have different slave types using different versions of buildbot?
Flags: needinfo?(coop)
Reporter | ||
Comment 8•8 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #7) > Ed: while I appreciate your interest in getting this fixed, releng is > *really* understaffed right now, and is struggling to meet existing > commitments, much less take on new work at the end of the quarter. > > There is a non-trivial amount of work to happen here especially on Windows, > and in all likelihood it will end up being a task for Q/markco to get a new > buildbot GPO setup once some testing has been done. Thank you for the update - just knowing roughly were we're at in terms of "how much effort is this to deploy" (comment 1 and comment 2 in this bug made it seem like this wasn't too much work) and "we're understaffed" is helpful - similar to review requests, it's the silence that's the most frustrating - more so than a "we won't be able to do this until X". > Can I ask whether there is any harm in getting the "easy" POSIX platforms > done first, and then deploying Windows later, i.e. is it dangerous to have > different slave types using different versions of buildbot? Just deploying for !Windows will be helpful and not dangerous, for bug 961075 at least (don't know if there are any other undeployed changes).
Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(coop)
Reporter | ||
Updated•8 years ago
|
Summary: Deploy hg.m.o/build/buildbot production-0.8 to buildslaves to pick up bug 961075 → Deploy hg.m.o/build/buildbot production-0.8 to non-windows buildslaves to pick up bug 961075
Reporter | ||
Comment 9•8 years ago
|
||
Coop, any chance we can just do !Windows? This landed on 2014-04-17, so it would be good to be able to reap the benefits of it before 3 months have passed. Thanks :-)
Comment 10•8 years ago
|
||
Sigh, for lack of other takers, I'm going to try to deploy the POSIX part myself.
Status: NEW → ASSIGNED
Flags: needinfo?(laura)
Flags: needinfo?(coop)
Priority: -- → P2
Updated•8 years ago
|
Assignee: coop → sbruno
Assignee | ||
Comment 11•8 years ago
|
||
Questions for Dustin about creating the buildbot sdist package. I am following instructions here: https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Modules/buildslave I am going to create the package in one of our linux64 slaves, then renaming it to eliminate hg-revision information from the tarball name before copying to the pypi repo. Are there any caveats or further instructions, or is that it?
Flags: needinfo?(dustin)
Assignee | ||
Updated•8 years ago
|
Blocks: b-linux64-hp-0025
Assignee | ||
Comment 12•8 years ago
|
||
I created the sdist tarballs buildbot-slave-0.8.4-pre-moz3.tar.gz and buildbot-0.8.4-pre-moz3.tar.gzbuildbot-0.8.4-pre-moz3.tar.gz on revision 5484a944940e (after removing the .hg folder not to have hg version information in the package name). I created them on b-linux64-hp-0025 using "setup.py sdist" command after changing VERSION files to "0.8.4-pre-moz3" They are now available in http://puppetagain.pub.build.mozilla.org/data/python/packages/ I am now testing the version change in puppet versions (0.8.4-pre-moz3) using my personal environment in /etc/puppet/environments/sbruno (releng-puppet2.srv.releng.scl3.mozilla.com)
Flags: needinfo?(dustin)
Assignee | ||
Comment 13•8 years ago
|
||
Attachment #8453718 -
Flags: review?(dustin)
Assignee | ||
Comment 14•8 years ago
|
||
Dustin: How can I configure a slave (b-linux64-hp-0025) so that at startup it configures connecting to my puppet environment instead of the production one? I would like to do so in order to test https://bugzilla.mozilla.org/show_bug.cgi?id=1009584 and verify in my staging buildbot environment that the new version of the buildbot slave is used and it's working.
Flags: needinfo?(dustin)
Comment 15•8 years ago
|
||
Comment on attachment 8453718 [details] [diff] [review] puppet_01 Review of attachment 8453718 [details] [diff] [review]: ----------------------------------------------------------------- As for your environment, you can "pin" the nodes to your environment -- see https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/HowTo/Set_up_a_user_environment#Pinning ::: modules/buildslave/manifests/install.pp @@ -21,1 @@ > active => true; You'll need to mark moz2 as active => false, and moz3 as active => true. This will let you switch back quickly (except in EC2) if there's some problem with the new version. Once you're happy with it, you can add ensure => absent for moz2 to delete it on all buildslaves. ::: modules/buildslave/manifests/install/version.pp @@ +15,5 @@ > } > # set the parameters for the virtualenv below. Each version should set > # $packages explicitly. > case $version { > + "0.8.4-pre-moz3": { Both versions should be included here, so that moz2 is still defined. moz3 doesn't need its own stanza, just "0.8.4-pre-moz2", "0.8.4-pre-moz3": { ...
Attachment #8453718 -
Flags: review?(dustin) → review-
Assignee | ||
Comment 16•8 years ago
|
||
I started version 0.8.4-pre-moz3 of the buildbot slave on b-linux64-hp-0025 and I got the following error: /tools/buildbot-0.8.4-pre-moz3/bin/python2.7 /tools/buildbot/bin/twistd --no_save --logfile /builds/slave/twistd.log --python /builds/slave/buildbot.tac Removing stale pidfile /builds/slave/twistd.pid Traceback (most recent call last): File "/tools/buildbot-0.8.4-pre-moz3/lib/python2.7/site-packages/twisted/application/app.py", line 631, in run runApp(config) File "/tools/buildbot-0.8.4-pre-moz3/lib/python2.7/site-packages/twisted/scripts/twistd.py", line 23, in runApp _SomeApplicationRunner(config).run() File "/tools/buildbot-0.8.4-pre-moz3/lib/python2.7/site-packages/twisted/application/app.py", line 374, in run self.application = self.createOrGetApplication() File "/tools/buildbot-0.8.4-pre-moz3/lib/python2.7/site-packages/twisted/application/app.py", line 439, in createOrGetApplication application = getApplication(self.config, passphrase) --- <exception caught here> --- File "/tools/buildbot-0.8.4-pre-moz3/lib/python2.7/site-packages/twisted/application/app.py", line 450, in getApplication application = service.loadApplication(filename, style, passphrase) File "/tools/buildbot-0.8.4-pre-moz3/lib/python2.7/site-packages/twisted/application/service.py", line 400, in loadApplication application = sob.loadValueFromFile(filename, 'application', passphrase) File "/tools/buildbot-0.8.4-pre-moz3/lib/python2.7/site-packages/twisted/persisted/sob.py", line 210, in loadValueFromFile exec fileObj in d, d File "/builds/slave/buildbot.tac", line 35, in <module> from buildslave import idleizer exceptions.ImportError: cannot import name idleizer Failed to load application: cannot import name idleizer I had a look to the code history, and file idleizer.py (which was present in SLAVE_0_8_4_PRE_MOZ2) has been removed in production-0.8: Simones-MacBook-Pro:~ sbruno$ hg checkout production-0.8 abort: no repository found in '/Users/sbruno' (.hg not found)! Simones-MacBook-Pro:~ sbruno$ cd buildbot Simones-MacBook-Pro:buildbot sbruno$ hg checkout production-0.8 3 files updated, 0 files merged, 0 files removed, 0 files unresolved Simones-MacBook-Pro:buildbot sbruno$ find . -name idleizer.py Simones-MacBook-Pro:buildbot sbruno$ hg checkout SLAVE_0_8_4_PRE_MOZ2 307 files updated, 0 files merged, 31 files removed, 0 files unresolved Simones-MacBook-Pro:buildbot sbruno$ find . -name idle*.py ./slave/buildslave/idleizer.py Since that file is referenced in an import in the currently used buildbot.tac on slaves, the buildbot slave cannot start. Does this mean that we need to roll-out a new version of buildbot.tac to the slaves as well, contextually to the new version of buildbot slave?
Comment 17•8 years ago
|
||
(In reply to Simone Bruno [:simone] from comment #16) > I started version 0.8.4-pre-moz3 of the buildbot slave on b-linux64-hp-0025 > and I got the following error: > Failed to load application: cannot import name idleizer > > I had a look to the code history, and file idleizer.py (which was present in > SLAVE_0_8_4_PRE_MOZ2) has been removed in production-0.8: > > Since that file is referenced in an import in the currently used > buildbot.tac on slaves, the buildbot slave cannot start. > > Does this mean that we need to roll-out a new version of buildbot.tac to the > slaves as well, contextually to the new version of buildbot slave? Ahhhhhh I see whats going on here. SLAVE_0_8_4_PRE_MOZ2 is our build *slave* buildbot version, specifically 0.8.4-pre while production-0.8 is 0.8.2-ish for the masters. Newer slaves can (theoretically) talk to older masters, and thats why this difference matters.
Comment 18•8 years ago
|
||
...which actually means.... Bug 961075 was applied to the wrong branch, though I suspect its easily transplanted.
Comment 19•8 years ago
|
||
In other words, you need to build 0.8.4-pre-moz3 based on the "slaves" branch, http://hg.mozilla.org/build/buildbot/rev/9dc77b3a5f14
Assignee | ||
Comment 20•8 years ago
|
||
I rebuilt the 0.8.4-pre-moz3 packages on revision 9dc77b3a5f14, but I got the following puppet exception: Error: /tools/buildbot-0.8.4-pre-moz3/bin/pip install --no-deps --no-index --find-links=http://releng-puppet2.srv.releng.scl3.mozilla.com/python/packages --find-links=http://releng-puppet1.srv.releng.scl3.mozilla.com/python/packages --find-links=http://releng-puppet2.srv.releng.usw2.mozilla.com/python/packages --find-links=http://releng-puppet2.srv.releng.use1.mozilla.com/python/packages --find-links=http://releng-puppet2.build.scl1.mozilla.com/python/packages --find-links=http://releng-puppet1.srv.releng.usw2.mozilla.com/python/packages --find-links=http://releng-puppet1.srv.releng.use1.mozilla.com/python/packages buildbot==0.8.4-pre-moz3 returned 1 instead of one of [0] In /root/.pip/pip.log, the error looks like: error: can't copy 'buildbot/db/migrate/migrate.cfg': doesn't exist or not a regular file I then noticed that the buildbot-0.8.4-pre-moz3.tar.gz package built on revision 9dc77b3a5f14 does not contain that file (while buildbot-0.8.4-pre-moz2 did): Simones-MacBook-Pro:buildbot-0.8.4-pre-moz2 sbruno$ find . -name migrate.cfg ./buildbot/db/migrate/migrate.cfg Simones-MacBook-Pro:buildbot-0.8.4-pre-moz2 sbruno$ cd ../buildbot-0.8.4-pre-moz3 Simones-MacBook-Pro:buildbot-0.8.4-pre-moz3 sbruno$ find . -name migrate.cfg Simones-MacBook-Pro:buildbot-0.8.4-pre-moz3 sbruno$
Assignee | ||
Comment 21•8 years ago
|
||
The puppet error above is raised after applying this patch
Assignee | ||
Comment 22•8 years ago
|
||
There are good news too, though: despite the reported error I was now able to start manually build-slave-0.8.4-pre-moz3, and it was connecting successfully to my dev master.
Updated•8 years ago
|
Attachment #8454397 -
Attachment is patch: true
Comment 23•8 years ago
|
||
That's weird, for sure. Can you compare the contents of the buildbot-0.8.4-pre-moz{2,3} tarballs? I wonder if, long ago, the moz2 tarball was actually built from the buildbot-0.8.2 sources? Buildbot-0.8.2 was the original, half-baked DB implementation, with no ability to upgrade/downgrade. In Buildbot-0.8.3, we switched to sqlalchemy-migrate, which is what that migrate.cfg file is for. So it should be present in 0.8.4.
Assignee | ||
Comment 24•8 years ago
|
||
Here is the requested comparison: http://people.mozilla.org/~sbruno/compare-buildbot-0.8.4-pre-moz2-3.html
Assignee | ||
Comment 25•8 years ago
|
||
I patched the slaves branch of buildbot as agreed with :dustin in #releng to include the missing db-migrations-related files. I had checkedt that the patch includes correctly those files in the build package on my mac, but when I tried to rebuild on linux64-hp-0025 those were still missing. I guess this is due to different versions of python/easy_install. The current version of the package uploaded in the pypi repo (http://puppetagain.pub.build.mozilla.org/data/python/packages/buildbot-0.8.4-pre-moz3.tar.gz) is the one I build on my mac. With that version, I was able to successfully run puppet on my test instance, and the slave connects successfully to my master in the stage environment (I started it manually). Before tagging, though, I need to make sure that the package is build with the correct version of python and build tools. :dustin: any warnings?
Assignee | ||
Comment 26•8 years ago
|
||
In particular, if there are requirements about the version of tools / platform which needs to be used to re-build buildbot packages, I would like to grab them here and update the docs. This is the good change for me to also understand the branching rationale of the buildbot project: is the production-0.8 branch used for masters and the slaves branch just to rollout new slave versions? Or maybe I am totally wrong here? (I would like to put this in docs as well, since there's a chance that the next guy working on similar issues will have similar doubts).
Comment 27•8 years ago
|
||
The branches were split when we upgraded the buildslaves without upgrading the buildmaster, which also happened to be while we were still running some 0.7 masters. So yes, the remaining masters are all built from the production-0.8 branch, while slaves are built from the slave branch. I'm not sure what you mean by "tools / platform". If you mean hg.mozilla.org/build/tools and gecko, then I have no idea. If you mean setuptools, then I think with the patch we decided on last week, the version shouldn't matter.
Assignee | ||
Comment 28•8 years ago
|
||
Thanks Dustin! I am referring to setuptools, and apparently the version of setuptools is still relevant after the patch you mention: I tried to build on current tip of https://hg.mozilla.org/build/buildbot/rev/slaves (which now includes that patch) on linux64-hp-0025, and the resulting package still does not include the migrate.cfg file. If I build on my personal mac, those files are included. distutils on my Mac (does include the migrate.cfg in built package): >>> import distutils >>> distutils.__version__ '2.7.2' distutils on linux slaves (does not include migrate.cfg): >>> import distutils >>> distutils.__version__ '2.6.6'
Comment 29•8 years ago
|
||
Ah, that's interesting! At least newer versions work. I guess the best we can do is document?
Flags: needinfo?(dustin)
Assignee | ||
Comment 30•8 years ago
|
||
Puppet change to pick up the newly updated packages for buildbot slave. Please note that the new buildbot master package has been built with Python 2.7.2 and distutils 2.7.2.
Attachment #8456089 -
Flags: review?(dustin)
Assignee | ||
Comment 31•8 years ago
|
||
I updated https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Modules/buildslave#Upgrading with info collected in this Bug.
Comment 32•8 years ago
|
||
Comment on attachment 8456089 [details] [diff] [review] puppet_03 Review of attachment 8456089 [details] [diff] [review]: ----------------------------------------------------------------- Did you mean that the new buildbot-slave package was built with those tools? I don't see the 'buildbot' package changing here.
Attachment #8456089 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 33•8 years ago
|
||
Comment on attachment 8456089 [details] [diff] [review] puppet_03 A reconfig is running right now to land this.
Attachment #8456089 -
Flags: checked-in+
Assignee | ||
Comment 34•8 years ago
|
||
Dustin: There are no puppet changes related to 'buildbot', but I built both buildbot and buildbot-slave and uploaded the two new packages to pypi.
Assignee | ||
Comment 35•8 years ago
|
||
Assuming that the patch has been reverted because I did not comply with the policy of keeping default aligned with production in the puppet repo, I re-landed the patch a few minutes ago (this time to production branch as well) to rollout the changes. I erroneously thought this change would have been rolled out by a reconfig (this is why I did not merge to prod).
Comment 36•8 years ago
|
||
No worries. I reverted it intending to land something of my own, but that didn't happen, so in the end I could have left well enough alone. Sorry about that!
Comment 37•8 years ago
|
||
This is deployed to Mac and Linux slaves now. Now we need to figure out the story on Windows.
Reporter | ||
Comment 38•8 years ago
|
||
For: https://tbpl.mozilla.org/php/getParsedLog.php?id=44256825&tree=Mozilla-Inbound Android 2.3 Emulator mozilla-inbound opt test mochitest-2 on 2014-07-21 00:45:47 PDT for push 98fa8afd9169 slave: tst-linux64-spot-1230 I see: command timed out: 2400 seconds without output, attempting to kill Which is the old style message. Is this expected to have been rolled out on the spot instances?
Flags: needinfo?(sbruno)
Assignee | ||
Comment 39•8 years ago
|
||
Yes, it should be on spot instances too. I had a look to the code and apparently the required patch was not applied here before, so I pushed it a few minutes ago and I am now going to upload a new version of the packages.
Flags: needinfo?(sbruno)
Reporter | ||
Comment 40•8 years ago
|
||
Strange, it landed in bug 961075 comment 6.
Assignee | ||
Comment 41•8 years ago
|
||
It was not applied to the "slaves" branch, which is the one to be used for this build. See comments 17, 18, 19 for details.
Assignee | ||
Comment 42•8 years ago
|
||
Packages 0.8.4-pre-moz4 have been built and uploaded to the puppet pypi repo already.
Attachment #8459546 -
Flags: review?(dustin)
Updated•8 years ago
|
Attachment #8459546 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 43•8 years ago
|
||
Comment on attachment 8459546 [details] [diff] [review] puppet_04 Checked in to default and merged to production.
Attachment #8459546 -
Flags: checked-in+
Assignee | ||
Comment 44•8 years ago
|
||
No reported issues on non-windows slaves deployment, marked as RESOLVED FIXED. Bug 1042597 has been created for windows slaves deployment.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•8 years ago
|
Summary: Deploy hg.m.o/build/buildbot production-0.8 to non-windows buildslaves to pick up bug 961075 → Deploy new version of hg.m.o/build/buildbot to non-windows buildslaves to pick up bug 961075
Reporter | ||
Comment 45•8 years ago
|
||
Looks good thank you :-) eg: https://tbpl-dev.allizom.org/php/getParsedLog.php?id=44432506&tree=Mozilla-Central
Updated•4 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•3 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•