Deploy new version of hg.m.o/build/buildbot to non-windows buildslaves to pick up bug 961075

RESOLVED FIXED

Status

Release Engineering
Platform Support
P2
normal
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: emorley, Assigned: simone)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments)

In order to pick up the fix in bug 961075, please can:
https://hg.mozilla.org/build/buildbot/rev/production-0.8
...be deployed to the build slaves.

I'm presuming we'll need to restart buildbot too, after the repo pull.
dustin, how do we deploy this?

The change is on production-0.8.

Is it a matter of?
* tagging it with SLAVE_0_8_4_PRE_MOZ3
* update puppet's version to 0.8.4-pre-moz3
* test the puppet code
Component: Buildduty → Platform Support
Flags: needinfo?(dustin)
QA Contact: bugspam.Callek → coop
You'll need to copy an sdist tarball of the new version to the puppetagain pypi, too.  Other than that, you've got it, noting that this will only deploy to OS X and Linux.
Assigning to coop to find resources.

IIUC deploying this will improve starring for sheriffs.

We should document this to help future deployments.
Assignee: nobody → coop
Flags: needinfo?(dustin)
Coop, I don't suppose you know when someone might have a chance to do this? :-)
Chris, any news on this? :-)
Flags: needinfo?(coop)
Laura, I don't seem to be having any luck getting resources allocated to this bug (filed almost a month ago), would you mind seeing if there is someone who can take a look? Deploying this would help out the sheriffs, be ensuring the buildbot failure messages for timed out jobs are more useful.
Flags: needinfo?(laura)

Comment 7

3 years ago
(In reply to Ed Morley [:edmorley UTC+0] from comment #5)
> Chris, any news on this? :-)

Ed: while I appreciate your interest in getting this fixed, releng is *really* understaffed right now, and is struggling to meet existing commitments, much less take on new work at the end of the quarter.

There is a non-trivial amount of work to happen here especially on Windows, and in all likelihood it will end up being a task for Q/markco to get a new buildbot GPO setup once some testing has been done.

Can I ask whether there is any harm in getting the "easy" POSIX platforms done first, and then deploying Windows later, i.e. is it dangerous to have different slave types using different versions of buildbot?
Flags: needinfo?(coop)
(In reply to Chris Cooper [:coop] from comment #7)
> Ed: while I appreciate your interest in getting this fixed, releng is
> *really* understaffed right now, and is struggling to meet existing
> commitments, much less take on new work at the end of the quarter.
> 
> There is a non-trivial amount of work to happen here especially on Windows,
> and in all likelihood it will end up being a task for Q/markco to get a new
> buildbot GPO setup once some testing has been done.

Thank you for the update - just knowing roughly were we're at in terms of "how much effort is this to deploy" (comment 1 and comment 2 in this bug made it seem like this wasn't too much work) and "we're understaffed" is helpful - similar to review requests, it's the silence that's the most frustrating - more so than a "we won't be able to do this until X".

> Can I ask whether there is any harm in getting the "easy" POSIX platforms
> done first, and then deploying Windows later, i.e. is it dangerous to have
> different slave types using different versions of buildbot?

Just deploying for !Windows will be helpful and not dangerous, for bug 961075 at least (don't know if there are any other undeployed changes).
Flags: needinfo?(coop)
Summary: Deploy hg.m.o/build/buildbot production-0.8 to buildslaves to pick up bug 961075 → Deploy hg.m.o/build/buildbot production-0.8 to non-windows buildslaves to pick up bug 961075
Coop, any chance we can just do !Windows? This landed on 2014-04-17, so it would be good to be able to reap the benefits of it before 3 months have passed. Thanks :-)
Sigh, for lack of other takers, I'm going to try to deploy the POSIX part myself.
Status: NEW → ASSIGNED
Flags: needinfo?(laura)
Flags: needinfo?(coop)
Priority: -- → P2

Updated

3 years ago
Assignee: coop → sbruno
(Assignee)

Comment 11

3 years ago
Questions for Dustin about creating the buildbot sdist package.

I am following instructions here: https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Modules/buildslave

I am going to create the package in one of our linux64 slaves, then renaming it to eliminate hg-revision information from the tarball name before copying to the pypi repo.

Are there any caveats or further instructions, or is that it?
Flags: needinfo?(dustin)
(Assignee)

Updated

3 years ago
Blocks: 825637
(Assignee)

Comment 12

3 years ago
I created the sdist tarballs buildbot-slave-0.8.4-pre-moz3.tar.gz and buildbot-0.8.4-pre-moz3.tar.gzbuildbot-0.8.4-pre-moz3.tar.gz on revision 5484a944940e (after removing the .hg folder not to have hg version information in the package name). I created them on b-linux64-hp-0025 using "setup.py sdist" command after changing VERSION files to "0.8.4-pre-moz3"

They are now available in http://puppetagain.pub.build.mozilla.org/data/python/packages/

I am now testing the version change in puppet versions (0.8.4-pre-moz3) using my personal environment in /etc/puppet/environments/sbruno (releng-puppet2.srv.releng.scl3.mozilla.com)
Flags: needinfo?(dustin)
(Assignee)

Comment 13

3 years ago
Created attachment 8453718 [details] [diff] [review]
puppet_01
Attachment #8453718 - Flags: review?(dustin)
(Assignee)

Comment 14

3 years ago
Dustin:

How can I configure a slave (b-linux64-hp-0025) so that at startup it configures connecting to my puppet environment instead of the production one?

I would like to do so in order to test https://bugzilla.mozilla.org/show_bug.cgi?id=1009584 and verify in my staging buildbot environment that the new version of the buildbot slave is used and it's working.
Flags: needinfo?(dustin)
Comment on attachment 8453718 [details] [diff] [review]
puppet_01

Review of attachment 8453718 [details] [diff] [review]:
-----------------------------------------------------------------

As for your environment, you can "pin" the nodes to your environment -- see https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/HowTo/Set_up_a_user_environment#Pinning

::: modules/buildslave/manifests/install.pp
@@ -21,1 @@
>              active => true;

You'll need to mark moz2 as active => false, and moz3 as active => true.  This will let you switch back quickly (except in EC2) if there's some problem with the new version.

Once you're happy with it, you can add ensure => absent for moz2 to delete it on all buildslaves.

::: modules/buildslave/manifests/install/version.pp
@@ +15,5 @@
>      }
>      # set the parameters for the virtualenv below.  Each version should set
>      # $packages explicitly.
>      case $version {
> +        "0.8.4-pre-moz3": {

Both versions should be included here, so that moz2 is still defined.  moz3 doesn't need its own stanza, just

        "0.8.4-pre-moz2", "0.8.4-pre-moz3": {
            ...
Attachment #8453718 - Flags: review?(dustin) → review-
(Assignee)

Comment 16

3 years ago
I started version 0.8.4-pre-moz3 of the buildbot slave on b-linux64-hp-0025 and I got the following error:

/tools/buildbot-0.8.4-pre-moz3/bin/python2.7 /tools/buildbot/bin/twistd --no_save --logfile /builds/slave/twistd.log --python /builds/slave/buildbot.tac
Removing stale pidfile /builds/slave/twistd.pid
Traceback (most recent call last):
  File "/tools/buildbot-0.8.4-pre-moz3/lib/python2.7/site-packages/twisted/application/app.py", line 631, in run
    runApp(config)
  File "/tools/buildbot-0.8.4-pre-moz3/lib/python2.7/site-packages/twisted/scripts/twistd.py", line 23, in runApp
    _SomeApplicationRunner(config).run()
  File "/tools/buildbot-0.8.4-pre-moz3/lib/python2.7/site-packages/twisted/application/app.py", line 374, in run
    self.application = self.createOrGetApplication()
  File "/tools/buildbot-0.8.4-pre-moz3/lib/python2.7/site-packages/twisted/application/app.py", line 439, in createOrGetApplication
    application = getApplication(self.config, passphrase)
--- <exception caught here> ---
  File "/tools/buildbot-0.8.4-pre-moz3/lib/python2.7/site-packages/twisted/application/app.py", line 450, in getApplication
    application = service.loadApplication(filename, style, passphrase)
  File "/tools/buildbot-0.8.4-pre-moz3/lib/python2.7/site-packages/twisted/application/service.py", line 400, in loadApplication
    application = sob.loadValueFromFile(filename, 'application', passphrase)
  File "/tools/buildbot-0.8.4-pre-moz3/lib/python2.7/site-packages/twisted/persisted/sob.py", line 210, in loadValueFromFile
    exec fileObj in d, d
  File "/builds/slave/buildbot.tac", line 35, in <module>
    from buildslave import idleizer
exceptions.ImportError: cannot import name idleizer

Failed to load application: cannot import name idleizer

I had a look to the code history, and file idleizer.py (which was present in SLAVE_0_8_4_PRE_MOZ2) has been removed in production-0.8:

Simones-MacBook-Pro:~ sbruno$ hg checkout production-0.8
abort: no repository found in '/Users/sbruno' (.hg not found)!
Simones-MacBook-Pro:~ sbruno$ cd buildbot
Simones-MacBook-Pro:buildbot sbruno$ hg checkout production-0.8
3 files updated, 0 files merged, 0 files removed, 0 files unresolved
Simones-MacBook-Pro:buildbot sbruno$ find . -name idleizer.py
Simones-MacBook-Pro:buildbot sbruno$ hg checkout SLAVE_0_8_4_PRE_MOZ2
307 files updated, 0 files merged, 31 files removed, 0 files unresolved
Simones-MacBook-Pro:buildbot sbruno$ find . -name idle*.py
./slave/buildslave/idleizer.py

Since that file is referenced in an import in the currently used buildbot.tac on slaves, the buildbot slave cannot start.

Does this mean that we need to roll-out a new version of buildbot.tac to the slaves as well, contextually to the new version of buildbot slave?
(In reply to Simone Bruno [:simone] from comment #16)
> I started version 0.8.4-pre-moz3 of the buildbot slave on b-linux64-hp-0025
> and I got the following error:

> Failed to load application: cannot import name idleizer
> 
> I had a look to the code history, and file idleizer.py (which was present in
> SLAVE_0_8_4_PRE_MOZ2) has been removed in production-0.8:
> 
> Since that file is referenced in an import in the currently used
> buildbot.tac on slaves, the buildbot slave cannot start.
> 
> Does this mean that we need to roll-out a new version of buildbot.tac to the
> slaves as well, contextually to the new version of buildbot slave?

Ahhhhhh I see whats going on here.

SLAVE_0_8_4_PRE_MOZ2 is our build *slave* buildbot version, specifically 0.8.4-pre

while production-0.8 is 0.8.2-ish for the masters.

Newer slaves can (theoretically) talk to older masters, and thats why this difference matters.
...which actually means.... Bug 961075 was applied to the wrong branch, though I suspect its easily transplanted.
In other words, you need to build 0.8.4-pre-moz3 based on the "slaves" branch, http://hg.mozilla.org/build/buildbot/rev/9dc77b3a5f14
(Assignee)

Comment 20

3 years ago
I rebuilt the 0.8.4-pre-moz3 packages on revision 9dc77b3a5f14, but I got the following puppet exception:

Error: /tools/buildbot-0.8.4-pre-moz3/bin/pip install --no-deps --no-index  --find-links=http://releng-puppet2.srv.releng.scl3.mozilla.com/python/packages  --find-links=http://releng-puppet1.srv.releng.scl3.mozilla.com/python/packages  --find-links=http://releng-puppet2.srv.releng.usw2.mozilla.com/python/packages  --find-links=http://releng-puppet2.srv.releng.use1.mozilla.com/python/packages  --find-links=http://releng-puppet2.build.scl1.mozilla.com/python/packages  --find-links=http://releng-puppet1.srv.releng.usw2.mozilla.com/python/packages  --find-links=http://releng-puppet1.srv.releng.use1.mozilla.com/python/packages  buildbot==0.8.4-pre-moz3 returned 1 instead of one of [0]

In /root/.pip/pip.log, the error looks like:
error: can't copy 'buildbot/db/migrate/migrate.cfg': doesn't exist or not a regular file

I then noticed that the 	buildbot-0.8.4-pre-moz3.tar.gz package built on revision 9dc77b3a5f14 does not contain that file (while buildbot-0.8.4-pre-moz2 did):

Simones-MacBook-Pro:buildbot-0.8.4-pre-moz2 sbruno$ find . -name migrate.cfg
./buildbot/db/migrate/migrate.cfg
Simones-MacBook-Pro:buildbot-0.8.4-pre-moz2 sbruno$ cd ../buildbot-0.8.4-pre-moz3
Simones-MacBook-Pro:buildbot-0.8.4-pre-moz3 sbruno$ find . -name migrate.cfg
Simones-MacBook-Pro:buildbot-0.8.4-pre-moz3 sbruno$
(Assignee)

Comment 21

3 years ago
Created attachment 8454397 [details] [diff] [review]
puppet_02

The puppet error above is raised after applying this patch
(Assignee)

Comment 22

3 years ago
There are good news too, though: despite the reported error I was now able to start manually build-slave-0.8.4-pre-moz3, and it was connecting successfully to my dev master.
Attachment #8454397 - Attachment is patch: true
That's weird, for sure.

Can you compare the contents of the buildbot-0.8.4-pre-moz{2,3} tarballs?  I wonder if, long ago, the moz2 tarball was actually built from the buildbot-0.8.2 sources?

Buildbot-0.8.2 was the original, half-baked DB implementation, with no ability to upgrade/downgrade.  In Buildbot-0.8.3, we switched to sqlalchemy-migrate, which is what that migrate.cfg file is for.  So it should be present in 0.8.4.
(Assignee)

Comment 24

3 years ago
Here is the requested comparison:

http://people.mozilla.org/~sbruno/compare-buildbot-0.8.4-pre-moz2-3.html
(Assignee)

Comment 25

3 years ago
I patched the slaves branch of buildbot as agreed with :dustin in #releng to include the missing db-migrations-related files.

I had checkedt that the patch includes correctly those files in the build package on my mac, but when I tried to rebuild on linux64-hp-0025 those were still missing.

I guess this is due to different versions of python/easy_install. The current version of the package uploaded in the pypi repo (http://puppetagain.pub.build.mozilla.org/data/python/packages/buildbot-0.8.4-pre-moz3.tar.gz) is the one I build on my mac.

With that version, I was able to successfully run puppet on my test instance, and the slave connects successfully to my master in the stage environment (I started it manually).

Before tagging, though, I need to make sure that the package is build with the correct version of python and build tools.

:dustin: any warnings?
(Assignee)

Comment 26

3 years ago
In particular, if there are requirements about the version of tools / platform which needs to be used to re-build buildbot packages, I would like to grab them here and update the docs.

This is the good change for me to also understand the branching rationale of the buildbot project: is the production-0.8 branch used for masters and the slaves branch just to rollout new slave versions? Or maybe I am totally wrong here? (I would like to put this in docs as well, since there's a chance that the next guy working on similar issues will have similar doubts).
The branches were split when we upgraded the buildslaves without upgrading the buildmaster, which also happened to be while we were still running some 0.7 masters.  So yes, the remaining masters are all built from the production-0.8 branch, while slaves are built from the slave branch.

I'm not sure what you mean by "tools / platform".  If you mean hg.mozilla.org/build/tools and gecko, then I have no idea.  If you mean setuptools, then I think with the patch we decided on last week, the version shouldn't matter.
(Assignee)

Comment 28

3 years ago
Thanks Dustin!

I am referring to setuptools, and apparently the version of setuptools is still relevant after the patch you mention: I tried to build on current tip of https://hg.mozilla.org/build/buildbot/rev/slaves (which now includes that patch) on linux64-hp-0025, and the resulting package still does not include the migrate.cfg file.

If I build on my personal mac, those files are included.

distutils on my Mac (does include the migrate.cfg in built package):
>>> import distutils
>>> distutils.__version__
'2.7.2'

distutils on linux slaves (does not include migrate.cfg):
>>> import distutils
>>> distutils.__version__
'2.6.6'
Ah, that's interesting!  At least newer versions work. I guess the best we can do is document?
Flags: needinfo?(dustin)
(Assignee)

Comment 30

3 years ago
Created attachment 8456089 [details] [diff] [review]
puppet_03

Puppet change to pick up the newly updated packages for buildbot slave. Please note that the new buildbot master package has been built with Python 2.7.2 and distutils 2.7.2.
Attachment #8456089 - Flags: review?(dustin)
(Assignee)

Comment 31

3 years ago
I updated https://wiki.mozilla.org/ReleaseEngineering/PuppetAgain/Modules/buildslave#Upgrading with info collected in this Bug.
Comment on attachment 8456089 [details] [diff] [review]
puppet_03

Review of attachment 8456089 [details] [diff] [review]:
-----------------------------------------------------------------

Did you mean that the new buildbot-slave package was built with those tools?  I don't see the 'buildbot' package changing here.
Attachment #8456089 - Flags: review?(dustin) → review+
(Assignee)

Comment 33

3 years ago
Comment on attachment 8456089 [details] [diff] [review]
puppet_03

A reconfig is running right now to land this.
Attachment #8456089 - Flags: checked-in+
(Assignee)

Comment 34

3 years ago
Dustin: There are no puppet changes related to 'buildbot', but I built both buildbot and buildbot-slave and uploaded the two new packages to pypi.
(Assignee)

Comment 35

3 years ago
Assuming that the patch has been reverted because I did not comply with the policy of keeping default aligned with production in the puppet repo, I re-landed the patch a few minutes ago (this time to production branch as well) to rollout the changes.

I erroneously thought this change would have been rolled out by a reconfig (this is why I did not merge to prod).
No worries.  I reverted it intending to land something of my own, but that didn't happen, so in the end I could have left well enough alone.  Sorry about that!
This is deployed to Mac and Linux slaves now. Now we need to figure out the story on Windows.
For:
https://tbpl.mozilla.org/php/getParsedLog.php?id=44256825&tree=Mozilla-Inbound
Android 2.3 Emulator mozilla-inbound opt test mochitest-2 on 2014-07-21 00:45:47 PDT for push 98fa8afd9169
slave: tst-linux64-spot-1230

I see:
command timed out: 2400 seconds without output, attempting to kill

Which is the old style message. Is this expected to have been rolled out on the spot instances?
Flags: needinfo?(sbruno)
(Assignee)

Comment 39

3 years ago
Yes, it should be on spot instances too.
I had a look to the code and apparently the required patch was not applied here before, so I pushed it a few minutes ago and I am now going to upload a new version of the packages.
Flags: needinfo?(sbruno)
Strange, it landed in bug 961075 comment 6.
(Assignee)

Comment 41

3 years ago
It was not applied to the "slaves" branch, which is the one to be used for this build. See comments 17, 18, 19 for details.
(Assignee)

Comment 42

3 years ago
Created attachment 8459546 [details] [diff] [review]
puppet_04

Packages 0.8.4-pre-moz4 have been built and uploaded to the puppet pypi repo already.
Attachment #8459546 - Flags: review?(dustin)
Attachment #8459546 - Flags: review?(dustin) → review+
(Assignee)

Comment 43

3 years ago
Comment on attachment 8459546 [details] [diff] [review]
puppet_04

Checked in to default and merged to production.
Attachment #8459546 - Flags: checked-in+
(Assignee)

Comment 44

3 years ago
No reported issues on non-windows slaves deployment, marked as RESOLVED FIXED.
Bug 1042597 has been created for windows slaves deployment.
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
(Assignee)

Updated

3 years ago
Summary: Deploy hg.m.o/build/buildbot production-0.8 to non-windows buildslaves to pick up bug 961075 → Deploy new version of hg.m.o/build/buildbot to non-windows buildslaves to pick up bug 961075
Looks good thank you :-)

eg:
https://tbpl-dev.allizom.org/php/getParsedLog.php?id=44432506&tree=Mozilla-Central
You need to log in before you can comment on or make changes to this bug.