Increase mochitest-browser-chrome chunks to 7, opt and debug, on all platforms

RESOLVED FIXED

Status

Release Engineering
General Automation
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: jgriffin, Assigned: jgriffin)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 3 obsolete attachments)

Comment hidden (empty)
(Assignee)

Comment 1

2 years ago
Created attachment 8655172 [details]
builders.txt

This adds 1300 builders by applying the new m-bc chunks (both regular and e10s) everywhere mozilla-aurora and above, sans twigs.

This is probably too many; we could save some by excluding pgo (should we?).

I wonder if we should apply this to only inbound, try, fx-team and b2g-inbound...that should theoretically be OK because we're using run-by-dir.  Any thoughts?
Attachment #8655172 - Flags: feedback?(jmaher)
(Assignee)

Comment 2

2 years ago
Created attachment 8655176 [details] [diff] [review]
mbc.diff

Oh here's the actual patch.
while we are running with runbydir, it doesn't mean that our sheriffing tools present errors per directory, it is still per job.  This means that if m-bc-8 is failing on inbound, it might translate to m-bc-3 on m-c or pgo.  That is confusing.  If we could say, run test_blah.html on these 6 previous pushes, then it would be no problem.

on try we don't have pgo, that saves a chunk of jobs.
on b2g-inbound I don't see bc jobs on non-pgo, again, that saves a lot of jobs

do we need 10 jobs, could we go from 3 to 6 instead?  The number 10 was thrown out there for plain which is already at 5 chunks.  But looking at the runtimes of the jobs on debug (50-80 minutes), I think splitting debug to 9 chunks would be good, maybe opt to 6.  Doing that gets into the confusion mentioned at the top of my comment.  I am thinking 7 chunks might be a good balance to get us by for the next year.  We should think about this for browser-chrome and devtools as well.

Regarding branches to touch, it should be on inbound/fx-team/central/try.  I get the impression we don't see many root cause failures on b2g-inbound, so maybe we can live with ignoring that branch.  Aurora/Beta- I am not sure it matters on these branches.  Aurora does have a lot of traffic, so I can see value in adding it there.

Jgriffin: What other branches would this affect if we didn't limit it to specific integration branches?

RyanVM: does b2g-inbound have a lot of failures to close the branch or backout as a result from failures on desktop mochitests (plain,bc,dt)?

RyanVM: would 7 chunks for bc jobs be good?
Flags: needinfo?(ryanvm)
(In reply to Joel Maher (:jmaher) from comment #3)
> while we are running with runbydir, it doesn't mean that our sheriffing
> tools present errors per directory, it is still per job.  This means that if
> m-bc-8 is failing on inbound, it might translate to m-bc-3 on m-c or pgo. 
> That is confusing.  If we could say, run test_blah.html on these 6 previous
> pushes, then it would be no problem.

This is already the reality with chunk-by-runtime. Lots of instances of the same test failing anywhere from bc1 to bc3 depending on circumstances. I wouldn't base decisions about how many chunks to run on this concern.

> on b2g-inbound I don't see bc jobs on non-pgo, again, that saves a lot of
> jobs

We run the same jobs on b2g-inbound as anywhere else. We just don't run a lot of non-PGO jobs there due to per-platform builds.
https://treeherder.mozilla.org/#/jobs?repo=b2g-inbound&revision=c9750b48ac73&filter-searchStr=chrome&group_state=expanded

The only real restriction on b2g-inbound is that we don't run tests on OSX 10.l0 or WinXP.

> Regarding branches to touch, it should be on inbound/fx-team/central/try.  I
> get the impression we don't see many root cause failures on b2g-inbound, so
> maybe we can live with ignoring that branch.  Aurora/Beta- I am not sure it
> matters on these branches.  Aurora does have a lot of traffic, so I can see
> value in adding it there.
> 
> Jgriffin: What other branches would this affect if we didn't limit it to
> specific integration branches?

I'd prefer the path of least complication, personally. Makes the configs a lot easier to follow if we don't have multiple different mochitest-bc jobs in there. I was actually looking forward to having a saner lay of the land after this work is finished.

> RyanVM: does b2g-inbound have a lot of failures to close the branch or
> backout as a result from failures on desktop mochitests (plain,bc,dt)?

Exceedingly rare. But given that we have no way of preventing people from pushing random changes to b-i that could break them, I don't think we can/should special-case it either. Given the relative lack of desktop jobs running on the branch, adding more chunks isn't going to have much affect on overall load anyway.

> RyanVM: would 7 chunks for bc jobs be good?

I'll take as many as I can get.
Flags: needinfo?(ryanvm)
(Assignee)

Comment 5

2 years ago
It's not a matter of test load, it's a matter of not over-consuming precious builders, so that would preclude us from rolling out similar changes to other mochitest suites.

With 7 chunks applied on all trunk branches (but not twigs or beta/release/esr) the number of builders added goes down to 761.

Removing b2g-inbound and mozilla-aurora from the list reduces this further to 521.

I think we target the latter so we can also increase mochitest-dt and mochitest-plain chunks.  Are you OK with this, Joel and Ryan?
Flags: needinfo?(ryanvm)
Flags: needinfo?(jmaher)
I like the 521 option, we could then do dt and mochitest-plain without troubles.  Eventually we can add these chunks to aurora/beta/etc.

This call should be for the sheriffs to make though.
Flags: needinfo?(jmaher)
Comment on attachment 8655172 [details]
builders.txt

didn't realize I had a feedback on a file here!  Overall the list of branches, platforms and jobs are what we would expect.  We have discussed further reductions.  Right now anything is better than what we have.
Attachment #8655172 - Flags: feedback?(jmaher) → feedback+
I'm a bit hesitant about treating b2g-inbound differently, but I'd be otherwise OK with using fewer chunks on the release branches for now.
Flags: needinfo?(ryanvm)
(Assignee)

Comment 9

2 years ago
Ok, I'll include b2g-inbound; the total builder count will be around 630.
(Assignee)

Comment 10

2 years ago
Created attachment 8657318 [details] [diff] [review]
mbc.diff

This version chunks b2g-inbound to 7, but leaves aurora at 3.
Attachment #8655176 - Attachment is obsolete: true
Attachment #8657318 - Flags: review?(jlund)
(Assignee)

Comment 11

2 years ago
Created attachment 8657321 [details] [diff] [review]
mbc.diff

Fixed some comments and variable names.
Attachment #8657318 - Attachment is obsolete: true
Attachment #8657318 - Flags: review?(jlund)
Attachment #8657321 - Flags: review?(jlund)
(Assignee)

Comment 12

2 years ago
Created attachment 8657322 [details]
differences.txt

List of changed builders (629 total)
Attachment #8655172 - Attachment is obsolete: true
Comment on attachment 8657321 [details] [diff] [review]
mbc.diff

Review of attachment 8657321 [details] [diff] [review]:
-----------------------------------------------------------------

looks great!

::: mozilla-tests/config.py
@@ +2257,5 @@
> +            continue
> +        for slave_platform in PLATFORMS[platform]['slave_platforms']:
> +            if slave_platform not in BRANCHES[branch]['platforms'][platform]:
> +                continue
> +            if branch in TWIGS or ('gecko_version' in BRANCHES[branch] and BRANCHES[branch]['gecko_version'] != trunk_gecko_version):

do you want this to ride the trains instead of locking it to m-c?
Attachment #8657321 - Flags: review?(jlund) → review+
(Assignee)

Comment 14

2 years ago
No, to preserve builders we want to leave this only on trunk.
(Assignee)

Comment 16

2 years ago
Adding sheriffs so there are no surprises when this gets reconfiged!
(Assignee)

Updated

2 years ago
Summary: Increase mochitest-browser-chrome chunks to 10, opt and debug, on all platforms → Increase mochitest-browser-chrome chunks to 7, opt and debug, on all platforms
(Assignee)

Updated

2 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.