Closed Bug 1458692 Opened 6 years ago Closed 6 years ago

upgrade buildbot masters to python 2.7.15

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

Attachments

(10 files)

2.58 KB, patch
catlee
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
834 bytes, patch
jlund
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
1.07 KB, patch
catlee
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
947 bytes, patch
catlee
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
1001 bytes, patch
catlee
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
1.04 KB, patch
rail
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
2.44 KB, patch
catlee
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
1.55 KB, patch
jlund
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
2.76 KB, patch
jlund
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
2.02 KB, patch
jlund
: review+
bhearsum
: checked-in+
Details | Diff | Splinter Review
I've done some testing on dev-master2 and it looks like our Buildbot masters *should* work fine with 2.7.15 (I was able to bring up a master with it, anyways). Masters will need to be restarted by hand after the upgrade. I did find an issue with queue dir rebuilding while testing this upgrade -- it ends up recloning and installing tools, which fails because it tries to point at public pypi. I'm attaching a patch that gets it pointing at our internal mirrors, and upgrades dev-master2 to 2.7.15. We should still canary a single production master before rolling out to the pool to make sure all is well.
Attachment #8972705 - Flags: review?(catlee)
Keywords: leave-open
This depends on the other patch before it is safe to land, too.
Attachment #8973003 - Flags: review?(jlund)
Attachment #8972705 - Flags: review?(catlee) → review+
Attachment #8973003 - Flags: review?(jlund) → review+
@ciduty - as per item in slack: bhearsum> 12:07:29 while i'm here -- how do you folks feel about upgrading one (just one) buildbot try master to python 2.7.15 tomorrow? <jlund> Jordan Lund bhearsum: sure! can buildduty help there? Of course they can keep an eye on that master's jobs. 12:36:06 they can work with sheriffs to do a esr52 try push 12:36:31 <•bhearsum> that would be really helpful actually <jlund> Jordan Lund k, I added a tracking item. If you can give folks a heads up on when you plan to and actually do this, they will keep an eye on it 12:39:55 <•bhearsum> yep, will give a heads up tomorrow
Pushed by bhearsum@mozilla.com: https://hg.mozilla.org/build/puppet/rev/a16f5f256ba2 upgrade dev master to python 2.7.15 and fix queue dir rebuilding. r=catlee https://hg.mozilla.org/build/puppet/rev/f6caf0020b3a upgrade a production buildbot master to python 2.7.15. r=jlund
Attachment #8972705 - Flags: checked-in+
Attachment #8973003 - Flags: checked-in+
Pushed by bhearsum@mozilla.com: https://hg.mozilla.org/build/puppet/rev/82a0c200250c fix bad invocation of 'python setup.py develop'. r=bustage
Hit this when rolling out to bm75. The current version of kombu that we run has a hack in it that breaks with newer versions of Python. This was removed in https://github.com/celery/kombu/commit/3f13d9797bc1234e51de98f30e94f511cc21390f, and released in 3.0.30, which also requires a new amqp. I tested this by hand on bm75, and it got pulse publisher running again. It appears to have happened on dev-master2 as well, but there was nagios alerts for it so I failed to notice it.
Attachment #8973205 - Flags: review?(catlee)
Attachment #8973205 - Flags: review?(catlee) → review+
Pushed by bhearsum@mozilla.com: https://hg.mozilla.org/build/puppet/rev/8a33d3107e50 Upgrade kombu and amqp to fix pulse publisher. r=catlee
Attachment #8973205 - Flags: checked-in+
I can't make bm75 pick up any jobs, so let's upgrade bm78, which is stealing all of the try jobs...
Attachment #8973230 - Flags: review?(catlee)
Attachment #8973230 - Flags: review?(catlee) → review+
Pushed by bhearsum@mozilla.com: https://hg.mozilla.org/build/puppet/rev/3f38f75d20b4 upgrade buildbot-master78 to python 2.7.15. r=catlee
Attachment #8973230 - Flags: checked-in+
Other than the kombu and amqp issue that's already been fixed, this appears to have worked fine - the jobs on buildbot-master78 worked fine after the upgrade. Next steps here are: * Continue rollout to a test and build master -- I think I'll wait until Monday to do this. We need to be careful not to use a master than runs releaserunner or other non-buildbot services for this. * Rollout to the rest of the buildbot masters that don't run non-buildbot services * Carefully rollout to the remainder of the buildbot masters -- watching releaserunner and other services carefully as we do so. I think the last step should probably wait until all the 60.0 release work has finished up.
This will roll out 2.7.15 to one test master per platform, and one build master. I'll be waiting until Monday to land it.
Attachment #8973246 - Flags: review?(catlee)
Attachment #8973246 - Flags: review?(catlee) → review+
Pushed by bhearsum@mozilla.com: https://hg.mozilla.org/build/puppet/rev/5423716df5a3 upgrade a few more buildbot masters to python 2.7.15. r=catlee
Attachment #8973246 - Flags: checked-in+
Discovered this while rolling out to a build master today -- used the wrong command for shutting down self serve - oops! I also noticed that self serve doesn't puppet correctly on the first try (it doesn't install deps in the correct order), but I'm not sure anything can be done about that. I tried this patch in my environment, and it fixed the shutdown issue.
Attachment #8973712 - Flags: review?(rail)
Attachment #8973712 - Flags: review?(rail) → review+
Attachment #8973712 - Flags: checked-in+
Pushed by bhearsum@mozilla.com: https://hg.mozilla.org/build/puppet/rev/7edba7c89738 fix selfserve shutdown command. r=rail
self serve ended up with the same issues that we hit with the pulse publisher - upgrading kombu fixes it there, too. Buildbot bridge will also be affected, so let's upgrade kombu and amqp now.
Attachment #8973727 - Flags: review?(catlee)
Attachment #8973727 - Flags: review?(catlee) → review+
Pushed by bhearsum@mozilla.com: https://hg.mozilla.org/build/puppet/rev/d02975ea7751 upgrade kombu and amqp on selfserve and buildbot bridge. r=catlee
Attachment #8973727 - Flags: checked-in+
I'm not sure if this will be a problem everywhere or not, but I had to manually restart the pulse and command runners on bm111 after the upgrade today. The other masters were fine (as of this writing).
(In reply to Ben Hearsum (:bhearsum) from comment #16) > I'm not sure if this will be a problem everywhere or not, but I had to > manually restart the pulse and command runners on bm111 after the upgrade > today. The other masters were fine (as of this writing). Turns out the other masters hit it too! It just took longer to show up. I'm still not sure why it's happening -- neither process does any useful logging to try to find out. Maybe we should just deal with restarting them by hand after the upgrades...
(In reply to Ben Hearsum (:bhearsum) from comment #17) > (In reply to Ben Hearsum (:bhearsum) from comment #16) > > I'm not sure if this will be a problem everywhere or not, but I had to > > manually restart the pulse and command runners on bm111 after the upgrade > > today. The other masters were fine (as of this writing). > > Turns out the other masters hit it too! It just took longer to show up. I'm > still not sure why it's happening -- neither process does any useful logging > to try to find out. Maybe we should just deal with restarting them by hand > after the upgrades... Turns out I restarted buildbot as root this morning, which caused the queue dirs to get root-owned files, which breaks the command & pulse publishers. So...let's not do that next time.
This upgrades a few more buildbot masters, including one of the l10n bumper ones and a buildbot bridge master. I think l10n bumper and buildbot bridge are low risk enough to do during a release week?
Attachment #8974013 - Flags: review?(jlund)
Attachment #8974013 - Flags: review?(jlund) → review+
Pushed by bhearsum@mozilla.com: https://hg.mozilla.org/build/puppet/rev/eb73105238b0 upgrade more buildbot masters to python 2.7.15. r=jlund
Attachment #8974013 - Flags: checked-in+
At this point we've canaried all of the various buildbot master services except for release runner. This patch upgrades all of the buildbot masters, except the two production release runner masters (81 & 85). One of the ones being upgraded is buildbot-master83, which is the dev release runner master - which we can use to verify that things on that side still work. That's still pinned to Tom's environment, so I'll work with him to figure out how to roll out to it.
Attachment #8974120 - Flags: review?(jlund)
Comment on attachment 8974120 [details] [diff] [review] upgrade all buildbot masters except the production release runner ones Review of attachment 8974120 [details] [diff] [review]: ----------------------------------------------------------------- let's do it! So if you have a node pinned, it will override this canary hack?
Attachment #8974120 - Flags: review?(jlund) → review+
(In reply to Jordan Lund (:jlund) from comment #22) > Comment on attachment 8974120 [details] [diff] [review] > upgrade all buildbot masters except the production release runner ones > > Review of attachment 8974120 [details] [diff] [review]: > ----------------------------------------------------------------- > > let's do it! So if you have a node pinned, it will override this canary hack? If a node is pinned to someone else's environment, it will use whatever manifests they have in place - which may or may not have this patch (depends how often that person pulls upstream).
QA Contact: catlee
Pushed by bhearsum@mozilla.com: https://hg.mozilla.org/build/puppet/rev/db3b05207857 upgrade all buildbot masters except the production release runner ones. r=jlund
Attachment #8974120 - Flags: checked-in+
The latest upgrade when extremely well. Just a couple of things of note: * There were the expected errors rebuilding the selfserve agent virtualenv * releaserunner & releaserunner3 on bm83 didn't restart after the upgrade - I had to do them by hand. They're working fine though. The only thing left to do here is upgrade bm81 & 85 once we're clear of releases. We'll need to restart releaserunner by hand on these, too.
The final stage.
Attachment #8974684 - Flags: review?(jlund)
Attachment #8974684 - Attachment is patch: true
Attachment #8974684 - Attachment mime type: video/dv → text/plain
Comment on attachment 8974684 [details] [diff] [review] upgrade ALL the buildbot masters lgtm
Attachment #8974684 - Flags: review?(jlund) → review+
Pushed by bhearsum@mozilla.com: https://hg.mozilla.org/build/puppet/rev/b8537fd45684 upgrade all the builrdbot masters to python 2.7.15. r=jlund
Attachment #8974684 - Flags: checked-in+
Had to fix releaserunner shutdown-for-rebuild commands in https://hg.mozilla.org/build/puppet/rev/b5f1718e90c1 -- they tried to use "service" instead of "supervisorctl".
All done here now.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: