Tracking bug for buildbot 0.7.10p1 upgrade

RESOLVED FIXED

Status

P2
normal
RESOLVED FIXED
10 years ago
5 years ago

People

(Reporter: catlee, Assigned: bhearsum)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments)

(Reporter)

Description

10 years ago
Buildbot 0.7.10p1 has lots of features that would be useful to us.  We should upgrade!
(Reporter)

Updated

10 years ago
Blocks: 435472

Comment 1

10 years ago
We need to take a version of buildbot that fixes at least http://buildbot.net/trac/ticket/446, too.

Updated

10 years ago
Blocks: 484542

Updated

10 years ago
Blocks: 480145
(Assignee)

Comment 2

10 years ago
We're going to try and do this early in Q2.
Assignee: catlee → bhearsum
Status: NEW → ASSIGNED
Priority: -- → P3
(Assignee)

Comment 3

10 years ago
Some of the nice features in 0.7.10p1 include:
* ATOM/RSS
* Fixed 'ping' button
* Fixed reconfig (no more tracebacks)
* Graceful slave shutdown
* Configurable BuildRequest merging

There's also a few patches which landed post-0.7.10 we should consider including:
* Fixes for one_line_per_build (http://buildbot.net/trac/ticket/455)
* Mercurial step fixes (http://buildbot.net/trac/ticket/462 and http://buildbot.net/trac/ticket/277)

We don't strictly need to take the Mercurial fixes but it might be good to get that testing out of the way - it wouldn't surprise me if it breaks us a bit.
(Assignee)

Comment 5

10 years ago
This is going to be a pretty easy import, by the looks of it. I'm going to be backing out the patch in bug 485584 since it hasn't solved our issue, and conflicts with some incoming changes. Other than that, there's a few conflicts: process/base.py, process/builder.py, slave/commands.py - all of which are trivial to resolve.

I still need to do a lot of testing in staging before we think about deploying this. It's going to be a bit of a pain to roll out, too, because we'll need to update all of the build slaves (we can probably omit Talos slaves from this since there's no big commands.py changes that affect them). It's probably going to require a fairly big downtime.

So, here's the plan:
* Test the new Buildbot in staging well, focusing on the Mercurial step
* Import 0.7.10p1 into production
* Schedule downtime, roll out across the farm.

Comment 6

10 years ago
There have been some major patches to the mail notifier stuff, and previously, I ran across problems with TinderboxNotifier, too. We should make sure those don't break, including the l10n-specific uses with WithProperties in tree names.
(In reply to comment #3)
> There's also a few patches which landed post-0.7.10 we should consider
> including:
> * Fixes for one_line_per_build (http://buildbot.net/trac/ticket/455)
> * Mercurial step fixes (http://buildbot.net/trac/ticket/462 and
> http://buildbot.net/trac/ticket/277)
> 
> We don't strictly need to take the Mercurial fixes but it might be good to get
> that testing out of the way - it wouldn't surprise me if it breaks us a bit.

Why not just import a clean 0.7.10p1, and omit those extra later fixes until they are included in 0.7.11, and we import a clean 0.7.11? 

It feels easier (and safer?) to import a clean 0.7.10p1, rather then pick-and-choose additional later changes, but I could be missing something.

Comment 8

10 years ago
Mantra #1, use 0.7.10p1 without patches, and you break the build. The builds won't get their .hg/hgrc set up with paths, which will break about:buildconfig, and make ident required for l10n builds.

Besides the technical details that we need to patch slave/commands.py to include the custom slave-side step code we have. I wonder if it's worth to fork those to slave/mozcommands.py, and to import that from commands.py, to make the distinction more apparent.
(Reporter)

Comment 9

10 years ago
I wonder if it's worth it to stop using buildbot's built-in mercurial support completely.

Comment 10

10 years ago
There are good things coming up, in particular the clobber on switching from one repo-as-branch to another is pretty tough to mimic in pure shell scripts. That's in patches towards .11, too.

Basically, when you have a fx36x clone, and you branch to a releases/mozilla-1.9.2 repo, the Mercurial step realizes that you're now pulling from some place else, and does a clobber. That's the same scenario why we're currently clobbering build/tools all the time, it's not comparing the repo you pulled from with the repo you want to pull from.
(Assignee)

Comment 11

10 years ago
(In reply to comment #7)
> (In reply to comment #3)
> > There's also a few patches which landed post-0.7.10 we should consider
> > including:
> > * Fixes for one_line_per_build (http://buildbot.net/trac/ticket/455)
> > * Mercurial step fixes (http://buildbot.net/trac/ticket/462 and
> > http://buildbot.net/trac/ticket/277)
> > 
> > We don't strictly need to take the Mercurial fixes but it might be good to get
> > that testing out of the way - it wouldn't surprise me if it breaks us a bit.
> 
> Why not just import a clean 0.7.10p1, and omit those extra later fixes until
> they are included in 0.7.11, and we import a clean 0.7.11? 
> 
> It feels easier (and safer?) to import a clean 0.7.10p1, rather then
> pick-and-choose additional later changes, but I could be missing something.

We've been importing a release + some patches every time we import a new Buildbot - so it's nothing new.

Some of these changes we don't _have_ to take, but since I'm going to be doing the work to import 0.7.10p1 I figure we may as well take some patches that will benefit us. I really want to take the Mercurial ones, and now that Axel mentions it, the MailNotifier ones, so we can deal with whatever bustage there at the same time.

Any any case, as Axel mentions, 0.7.10p1 stock will break the build:

(In reply to comment #8)
> Mantra #1, use 0.7.10p1 without patches, and you break the build. The builds
> won't get their .hg/hgrc set up with paths, which will break about:buildconfig,
> and make ident required for l10n builds.
(Assignee)

Comment 12

10 years ago
(In reply to comment #8)
> Besides the technical details that we need to patch slave/commands.py to
> include the custom slave-side step code we have. I wonder if it's worth to fork
> those to slave/mozcommands.py, and to import that from commands.py, to make the
> distinction more apparent.

I think we should avoid these as much as possible mainly because of the huge PITA to deploy them initial + the inevitable bugfixes. But, we do have one custom command in here currently, and I think it's a great idea to move it out.
(Assignee)

Comment 13

10 years ago
I've imported 0.7.10p1 and the following tickets into my user repository:
http://buildbot.net/trac/ticket/455
http://buildbot.net/trac/ticket/446
http://buildbot.net/trac/ticket/451
http://buildbot.net/trac/ticket/277
http://buildbot.net/trac/ticket/462

The repository is here: http://hg.mozilla.org/users/bhearsum_mozilla.com/buildbot. I plan to start testing this week starting on staging-master:moz2-master. Once I have all of that sorted out I'll move onto try and talos.
(Assignee)

Comment 14

10 years ago
Turns out I forgot to 'hg addremove' after unpacking 0.7.10p1. I've fixed my repository to include all the new files.
(Assignee)

Updated

10 years ago
Depends on: 487496
(Assignee)

Comment 15

10 years ago
While testing 0.7.10p1 on the staging try server I encountered a problem with the MozillaPatchDownload step. I landed a fix upstream for it, and also in http://hg.mozilla.org/users/bhearsum_mozilla.com/buildbot.

Other than that, and the issue I filed bug 487496 for, everything has been fine. I still have to test the Talos buildbot though, and I wouldn't be surprised to find a thing or two that needs fixing.
Priority: P3 → P2
Blocks: 488262
(Assignee)

Updated

10 years ago
No longer blocks: 488262
(Reporter)

Updated

10 years ago
Blocks: 488368
(Reporter)

Updated

10 years ago
Blocks: 488273
(Assignee)

Comment 16

10 years ago
Created attachment 374948 [details]
Buildbot 0.7.10p1 upgrade script for Linux/Mac
(Assignee)

Comment 17

10 years ago
Created attachment 374949 [details]
Buildbot 0.7.10p1 upgrade script for Windows
(Assignee)

Comment 18

10 years ago
Deployment on Linux:
* Log on as root
wget --no-check-certificate -Obuildbot-0.7.10p1.sh https://bugzilla.mozilla.org/attachment.cgi?id=374948
chmod +x buildbot-0.7.10p1.sh
./buildbot-0.7.10p1.sh

Deployment on Mac:
* Log on as cltbld
wget --no-check-certificate -Obuildbot-0.7.10p1.sh https://bugzilla.mozilla.org/attachment.cgi?id=374948
chmod +x buildbot-0.7.10p1.sh
sudo ./buildbot-0.7.10p1.sh

Deployment on Windows:
* Log on as Administrator
wget --no-check-certificate -Obuildbot-0.7.10p1.sh https://bugzilla.mozilla.org/attachment.cgi?id=374949
chmod +x buildbot-0.7.10p1.sh
./buildbot-0.7.10p1.sh
(Assignee)

Comment 19

10 years ago
After a few bumps in the road we've got this deployed. Major problems were:
* Talos losing the ability to override commands (fixed in bug 487496)
* Windows slaves failing due to http://buildbot.net/trac/ticket/456. We checked in this patch and updated the slaves to fix it.
* Many builds failing due to SetMozillaBuildProperties not existing on the slaves. This was the result of a bad merge during the initial import. To fix, re-added the command into commands.py and the slaves were updated.
(Reporter)

Comment 20

10 years ago
try-mac-slave06
moz2-darwin9-slave03

weren't updated because they're offline
(Reporter)

Comment 21

10 years ago
moz2-darwin9-slave03 has been upgraded.

Holding off on try-mac-slave06 until we get the new buildbot code working on try slaves.
Depends on: 490850
(Assignee)

Comment 22

10 years ago
All of the production-1.8 and production-1.9 master + slaves have been updated now. Still to do:
staging-1.9
1.9 unittest
(Assignee)

Comment 23

10 years ago
staging-1.9 has been upgraded.
(Assignee)

Comment 24

10 years ago
We're hitting what seems to be an ignorable traceback on the 1.9 masters, related to l10n:
	  File "/tools/buildbot/lib/python2.5/site-packages/buildbot/master.py", line 759, in <lambda>
	    d.addCallback(lambda res: self.loadConfig_Schedulers(schedulers))
	  File "/tools/buildbot/lib/python2.5/site-packages/buildbot/master.py", line 835, in loadConfig_Schedulers
	    d.addCallback(updateDownstreams)
	  File "/tools/twisted-2.4.0/lib/python2.5/site-packages/twisted/internet/defer.py", line 191, in addCallback
	    callbackKeywords=kw)
	  File "/tools/twisted-2.4.0/lib/python2.5/site-packages/twisted/internet/defer.py", line 182, in addCallbacks
	    self._runCallbacks()
	--- <exception caught here> ---
	  File "/tools/twisted-2.4.0/lib/python2.5/site-packages/twisted/internet/defer.py", line 307, in _runCallbacks
	    self.result = callback(self.result, *args, **kw)
	  File "/tools/buildbot/lib/python2.5/site-packages/buildbot/master.py", line 834, in updateDownstreams
	    s.checkUpstreamScheduler()
	  File "/tools/buildbot/lib/python2.5/site-packages/buildbot/scheduler.py", line 350, in checkUpstreamScheduler
	    for s in self.parent.allSchedulers():
	<type 'exceptions.AttributeError'>: 'NoneType' object has no attribute 'allSchedulers'
	

I've commented in the upstream ticket about it, it doesn't seem to be interfering with anything, though. http://buildbot.net/trac/ticket/35
(Assignee)

Comment 25

10 years ago
The only thing left to do here is get the Try Server slaves upgraded to 0.7.10p1. This is blocked on figuring out how to avoid them breaking when the try repository grows too many heads.
(Assignee)

Comment 26

10 years ago
Created attachment 378326 [details] [diff] [review]
MozillaTryServerHgClone fixes for 0.7.10p1+

Last week I worked with the maintainers of the Buildbot Mercurial code and they landed an upstream patch that will enable us to use 'hg clone --rev' on the try server. We'll need to pull in http://github.com/djmitche/buildbot/commit/483a6043ed2cab2436009eeb7465269b7a48e65f, and land the attached patch. We'll need a short downtime so we can upgrade the slaves at the same time as we land these.
Attachment #378326 - Flags: review?(catlee)
(Reporter)

Updated

10 years ago
Attachment #378326 - Flags: review?(catlee) → review+
(Assignee)

Comment 27

10 years ago
Comment on attachment 378326 [details] [diff] [review]
MozillaTryServerHgClone fixes for 0.7.10p1+

changeset:   299:1aa4bb2bdf4d
Attachment #378326 - Flags: checked‑in+ checked‑in+
(Assignee)

Comment 28

10 years ago
I got the Try Server slaves upgraded today (yay).
(Assignee)

Comment 29

10 years ago
This bug is ripe for the closing - all of our installations have been updated to 0.7.10p1, save 1.9 unittests (which is ok).
Status: ASSIGNED → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.