Closed Bug 985556 Opened 10 years ago Closed 10 years ago

Bump MAX_BROKER_REFS to 4096

Categories

(Release Engineering :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: armenzg)

References

Details

Attachments

(4 files)

This is going to suck quite a bit.

Options:
* Bump the limit
* Remove one/two project branches + disable b2g18 branches
* Split linux64 test masters by product
* Run b2g reftests on linux32 VMs

Any preference?
Attachment #8393612 - Flags: review?(aki)
Attachment #8393613 - Flags: review?(aki)
Attachment #8393614 - Flags: review?(aki)
Attachment #8393612 - Flags: review?(aki) → review+
Comment on attachment 8393613 [details] [diff] [review]
raise_limit.buildbot.diff

I think you'll need to touch slavealloc's buildbot.tac template as well.
Attachment #8393613 - Flags: review?(aki) → review+
Attachment #8393614 - Flags: review?(aki) → review+
I assume that I will have to land first on puppet, then on slavealloc and then on buildbot-configs.
At that point I should start rebooting the masters?
Or better to use the manhole?
Attachment #8393623 - Flags: review?(aki)
Attachment #8393623 - Flags: review?(aki) → review+
Summary: Enabling EC2 B2g reftests across the systems causes us to hit the maximum number of builders for tst-linux64 machines → Bump MAX_BROKER_REFS to 4096
Depends on: 985582
Should I back out the buildbot-configs patch before someone increases the builder limit before tomorrow?
FYI, I can see slavealloc giving the right value on buildbot.tac:
> twisted.spread.pb.MAX_BROKER_REFS = 4096
I will be batching the masters like this:
https://etherpad.mozilla.org/X20KMNQsXP

Steps:
1) disable the masters on slavealloc
2) use manage_masters.py to gracefull_stop update_buildbot start (3 actions)
3) enable the masters on slavealloc
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #11)
> I will be batching the masters like this:
> https://etherpad.mozilla.org/X20KMNQsXP
> 
> Steps:
> 1) disable the masters on slavealloc
> 2) use manage_masters.py to gracefull_stop update_buildbot start (3 actions)
> 3) enable the masters on slavealloc

Two comments:
1) You should leave out bm01-06 -- they're not in production yet. I'll make sure they come up with the right stuff.
2) If you disable bm51 and 52 at the same time, bm67 will end up as the only master for that pool (use1 linux tests). I recommend against doing this - the master will probably grind to a halt or die.
I updated the etherpad taking collocation into consideration.
If we had a way to disable/enable a master through slavealloc we could totally script this.
Do we know if this works via manhole? If so, then we can use fabric to make this change via manhole to all the masters.
Attachment #8393612 - Flags: checked-in+
Comment on attachment 8393613 [details] [diff] [review]
raise_limit.buildbot.diff

Landed but not deployed.
Attachment #8393613 - Flags: checked-in+
Comment on attachment 8393614 [details] [diff] [review]
raise_limit.bc.diff

Backed out until the masters are ready.
Attachment #8393614 - Flags: checked-in-
Attachment #8393623 - Flags: checked-in+
catlee has updated the test masters without any issues so far.
He has used fabric to accomplish it.
I don't see anything out of the ordinary:
http://builddata.pub.build.mozilla.org/reports/pending/pending.html
catlee deployed the rest.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Attachment #8393614 - Flags: checked-in- → checked-in+
Live in production.
(In reply to Armen Zambrano [:armenzg] (EDT/UTC-4) from comment #7)
> http://hg.mozilla.org/build/buildbot/rev/7ce79514a42d

The buildbot commit only landed on default & hasn't been merged to the production branch. Does the production branch do anything in this repo? (I'm just trying to figure out what I need to do in bug 961075).
Flags: needinfo?(armenzg)
Interesting situation.
It should have landed on production and the masters use production, however, we modified the masters live by using the manhole. That is why everything still worked as we intended to.
I will land it on the right place.
Flags: needinfo?(armenzg)
(In reply to Armen Zambrano [:armenzg] (EDT/UTC-4) from comment #22)
> Interesting situation.
> It should have landed on production and the masters use production, however,
> we modified the masters live by using the manhole. That is why everything
> still worked as we intended to.
> I will land it on the right place.

Please make sure you run "update_buildbot" on the masters if you're landing to the production branch. Otherwise the code on disk won't match what's running in memory.
(In reply to Armen Zambrano [:armenzg] (EDT/UTC-4) from comment #22)
> Interesting situation.
> It should have landed on production and the masters use production, however,
> we modified the masters live by using the manhole. That is why everything
> still worked as we intended to.

Ah! :-)
I went into a meeting and I did not had a chance to do it.
I've updated all masters with update_buildbot.
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: