Get e10s tests running on inbound, central, and try

RESOLVED FIXED

Status

Release Engineering
General Automation
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: billm, Assigned: armenzg)

Tracking

(Blocks: 2 bugs)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(5 attachments, 1 obsolete attachment)

(Reporter)

Description

3 years ago
The purpose of this bug is to get e10s tests running on all the main branches. Right now they're running on holly. Only Linux mochitest-plain opt tests are green right now, so I'd like to start with those. However, I think it makes sense to design things so that it will be easy to enable other platforms when they're ready. Mac and Windows mochitest-plain opt runs should be ready soon, and other stuff will follow.

Right now the main focus is on all the mochitests and reftests. I don't think XPC shell or C++ tests care whether we're using e10s, and I don't know anything about marionette or jetpack, so I'm ignoring those tests.

Initially I think we'll want to start with the tests not showing up unless &noignore=1. That way we can catch intermittent failures before enabling the tests for real.

We'll be using the same build as we do for non-e10s testing. The only change needed when running the tests is that we need to pass --e10s to the test runner (runtests.py or runreftests.py). From talking to Ted and Clint, it sounds like we'll need to make changes to buildbot, mozharness, and tbpl.

I was thinking that the tests could appear as a separate line on tbpl, similar to the Linux ASAN tests. However, Clint and Ted think it would be better to put them on the same line as the existing non-e10s tests because they're not a separate build. Ed, you're probably the best person to make this decision.

Armen, do you think you might have time to work on this in the next month or so? I could try to do it myself, but I think it would take me a lot longer than somebody who knows the code and how to test it.
Flags: needinfo?(emorley)
Flags: needinfo?(armenzg)
(Reporter)

Updated

3 years ago
Blocks: 989535
(Assignee)

Comment 1

3 years ago
Created attachment 8399485 [details] [diff] [review]
e10s.diff

edmorley, how do we go about naming?
mochitest-e10s-# ?

With regards to time, I'm rather backlogged right now. However, I think this is all that is required for now.

The Holly tree is running these jobs with --e10s, however, I can't figure how.
I know it is coming from treeconfig.json, however, I can't find where that is on the tree (I used mxr).
billm, how did this get setup?
Attachment #8399485 - Flags: review?(aki)
Flags: needinfo?(armenzg)

Comment 2

3 years ago
(In reply to Bill McCloskey (:billm) from comment #0)
> Initially I think we'll want to start with the tests not showing up unless
> &noignore=1. That way we can catch intermittent failures before enabling the
> tests for real.

Sgtm :-)

(&noignore=1 is now &showall=1 - but the old syntax will still work)

> I was thinking that the tests could appear as a separate line on tbpl,
> similar to the Linux ASAN tests. However, Clint and Ted think it would be
> better to put them on the same line as the existing non-e10s tests because
> they're not a separate build. Ed, you're probably the best person to make
> this decision.

We can implement either - separate line would be slightly easier from a implementation and visual POV (same line would mean doubling up the mochitest/reftest job types for e10s and deciding on the best way to depict/group them next to the standard tests), but I'm on the fence. CCing :sheriffs for their thoughts.

I'm presuming the end goal here is to eventually switch on e10s by default and then depreciate the not-e10s test jobs?
Flags: needinfo?(emorley)

Comment 3

3 years ago
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #1)
> edmorley, how do we go about naming?
> mochitest-e10s-# ?

Yeah that sounds reasonable :-)

I guess we could always display them in TBPL as:

M(1 2 3 4 5 bc oth)  R(R C J)  e10s(1 2 3 4 5 bc oth R C J)

Comment 4

3 years ago
Note once the tests are ready to be unhidden, they'll also need to be run on the other integration repositories (eg fx-team, b2g-inbound), in addition to the ones currently in the bug summary.
(Reporter)

Comment 5

3 years ago
Thanks Armen! I guess we still need a patch for tbpl? I also realized that we'll probably want to add some trychooser syntax to get e10s mochitests.

(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #1)
> The Holly tree is running these jobs with --e10s, however, I can't figure
> how.
> I know it is coming from treeconfig.json, however, I can't find where that
> is on the tree (I used mxr).
> billm, how did this get setup?

It's checked in here:
https://hg.mozilla.org/projects/holly/file/b525fd1e4fbf/testing/config/mozharness/linux_config.py#l16

> I'm presuming the end goal here is to eventually switch on e10s by default and then
> depreciate the not-e10s test jobs?

There may be a few tests that we'll want to run in single-process mode indefinitely, but the eventual goal is to test almost everything in e10s and disable the single-process versions of those tests.

Comment 6

3 years ago
(In reply to Bill McCloskey (:billm) from comment #5)
> Thanks Armen! I guess we still need a patch for tbpl? 

Once the job name is fixed & the decision is made for same line vs separate, I'm happy to to the TBPL patch & can have it deployed pretty quickly.

Comment 7

3 years ago
Comment on attachment 8399485 [details] [diff] [review]
e10s.diff

I think you need the per-platform stuff too.

Updated

3 years ago
Attachment #8399485 - Flags: review?(aki) → review-
(Assignee)

Comment 8

3 years ago
Eventually, when we're in good shape, I would rolling these changes like this:
1) enable across all gecko trunk trees running side-by-side the new and old configs
2) when we're happy, we land a change to testing/config/mozharness/linux_config.py to make --e10s the default
** this means that we will see the change cripple through from m-i, to m-c and remaining trunk branches
** this means that both normal builders and e10s builders will be running the same configuration as the change flows
*** the e10s jobs would have --e10s twice and I assume that would cause a syntax error
** we could at this point start hiding the e10s jobs since the default mochitest jobs would run with e10s
3) we could then backout any buildbot-config patches that we have landed in here
** the m-c config.py change will eventually make it into m-a, m-b et al without assistance
** we would have to leave the tbpl changes in place until all platforms switch to e10s by default

I know that this approach has never been done before since we've never had in-tree configs.

Does this make sense to all of you?
(Assignee)

Comment 9

3 years ago
Created attachment 8399507 [details] [diff] [review]
e10s.diff
Attachment #8399485 - Attachment is obsolete: true
Attachment #8399507 - Flags: review?(aki)

Comment 10

3 years ago
Comment on attachment 8399507 [details] [diff] [review]
e10s.diff

r=me with the per-platform block added to ubuntu64_vm as well.
Attachment #8399507 - Flags: review?(aki) → review+
(In reply to Ed Morley [:edmorley UTC+0] from comment #3)
> (In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4)
> from comment #1)
> > edmorley, how do we go about naming?
> > mochitest-e10s-# ?
> 
> Yeah that sounds reasonable :-)
> 
> I guess we could always display them in TBPL as:
> 
> M(1 2 3 4 5 bc oth)  R(R C J)  e10s(1 2 3 4 5 bc oth R C J)

<jobname>-e10s-# makes sense as a naming convention. I'm thinking for grouping, we could have M-e10s(1 2 3 4 5 bc oth), R-e10s(R C J), etc as complementary groups to what we already have. I'm not thrilled about grouping all the different jobs into one big e10s group.
Out of curiosity, how would the reftests/crashtests here differ from the IPC jobs we currently run on TBPL?
(Assignee)

Comment 13

3 years ago
Comment on attachment 8399507 [details] [diff] [review]
e10s.diff

This will be live once a reconfig happens this week.
I could speed things up if needed, however, I prefer let buildduty takes care of it.
Attachment #8399507 - Flags: checked-in+
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #11)
> <jobname>-e10s-# makes sense as a naming convention. I'm thinking for
> grouping, we could have M-e10s(1 2 3 4 5 bc oth), R-e10s(R C J), etc as
> complementary groups to what we already have. I'm not thrilled about
> grouping all the different jobs into one big e10s group.

sgtm :-)
Ryan's kindly offered to do the TBPL patch.
(Reporter)

Comment 16

3 years ago
Thanks again. No problem waiting for a reconfig since the tbpl change will take a little while anyhow.

(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #12)
> Out of curiosity, how would the reftests/crashtests here differ from the IPC
> jobs we currently run on TBPL?

The only difference between "reftests-ipc" and "reftests --e10s" is that reftests-ipc runs a much reduced set of reftests (just the sanity tests I think) while reftests --e10s runs everything. In order to get "reftests --e10s" into production, we'll have to green up the tests that aren't run in reftests-ipc. Once that happens and reftests --e10s is running in production, we can eliminate reftests-ipc since it will be redundant.
Depends on: 990639
Hidden the e10s jobs that have run so far. @sheriffs: we'll need to hide the rest as they complete.
(Also on Try)
this was backed out: http://hg.mozilla.org/build/buildbot-configs/rev/06938db75c06
(Reporter)

Comment 20

3 years ago
Looks like we're missing a patch to mozharness. desktop_unittest.py needs to pass the --e10s option down to runtests.py.

https://tbpl.mozilla.org/php/getParsedLog.php?id=37079704&tree=Mozilla-Inbound
backout in comment 19 is in production.

Comment 22

3 years ago
(In reply to Bill McCloskey (:billm) from comment #20)
> Looks like we're missing a patch to mozharness. desktop_unittest.py needs to
> pass the --e10s option down to runtests.py.
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=37079704&tree=Mozilla-
> Inbound

Yeah.  I was thinking the in-tree configs would be sufficient, but that would break all non-plain mochitests.
(Reporter)

Comment 23

3 years ago
Created attachment 8400419 [details] [diff] [review]
mozharness-e10s

This just adds a --e10s option to desktop_unittest.py that gets passed onto runtests.py and runreftests.py.
Attachment #8400419 - Flags: review?(aki)

Comment 24

3 years ago
Comment on attachment 8400419 [details] [diff] [review]
mozharness-e10s

I think this is a good solution while we're in the transition phase.  Let's back this patch out once all e10s tests are green and we can control --e10s from the in-tree configs.
Attachment #8400419 - Flags: review?(aki) → review+
(Reporter)

Comment 25

3 years ago
I pushed both of these changes:
https://hg.mozilla.org/build/buildbot-configs/rev/c7085883d40b
https://hg.mozilla.org/build/mozharness/rev/1c5efceb56bf
(Assignee)

Comment 26

3 years ago
My apologies. I don't know what I was thinking.

We're hoping for a reconfig today.
https://hg.mozilla.org/build/buildbot-configs/rev/c7085883d40b -> live in production
https://hg.mozilla.org/build/mozharness/rev/1c5efceb56bf -> live in production

:)
(Assignee)

Updated

3 years ago
Assignee: nobody → armenzg
(Reporter)

Comment 28

3 years ago
There seems to be a problem where buildbot isn't passing --cfg unittests/linux_unittest.py for PGO builds.
https://tbpl.mozilla.org/php/getParsedLog.php?id=37227886&tree=Mozilla-Central
(Reporter)

Comment 29

3 years ago
Please don't back out though :-). This is working for non-PGO.
(Reporter)

Comment 30

3 years ago
Oops. Based on inbound, I guess it's Linux64 that's broken, not PGO.
(Reporter)

Comment 31

3 years ago
Created attachment 8401500 [details] [diff] [review]
fix-linux64

I think this should fix the problem. I added configs for all platforms since I want to get those running eventually anyway.

I hate to complain, but all the duplication here seems pretty extreme.
Attachment #8401500 - Flags: review?(aki)

Comment 32

3 years ago
Comment on attachment 8401500 [details] [diff] [review]
fix-linux64

It is pretty extreme.

The other extreme is making it programmatic or set once per type, which is awesome until you have to override something.  One or two overrides and it's not that bad.  Dozens upon dozens and it gets pretty ugly, especially if multiple of those overrides are also programmatic and override too much, so you need more specific overrides later to get that right.  I don't know how to fix it so it's easy to write, easy to read, and easy to customize atm, but it's a problem we keep hitting and trying to solve in different ways.

Thanks for the patch, and welcome to the discussion on how to fix our configs :)
Attachment #8401500 - Flags: review?(aki) → review+
(Reporter)

Comment 33

3 years ago
https://hg.mozilla.org/build/buildbot-configs/rev/8ddadc1da1ed

These tests are hidden right now, but it would still be nice to get this into production in the next day or so if possible.
(Reporter)

Comment 34

3 years ago
Somebody told me today that we also have to enable these on fx-team and b2g-inbound in order to unhide the tests [1]. What's the branch name for b2g-inbound? I couldn't find any instances in config.py.

[1] https://wiki.mozilla.org/Sheriffing/Job_Visibility_Policy

Comment 35

3 years ago
b2g-inbound : http://hg.mozilla.org/build/buildbot-configs/file/575ccb9b4185/mozilla/project_branches.py#l43
(Reporter)

Comment 36

3 years ago
Created attachment 8401602 [details] [diff] [review]
other-branches

Duh. Thanks.
Attachment #8401602 - Flags: review?(aki)
(Reporter)

Comment 37

3 years ago
Actually, reading more carefully, it looks like other trees may be needed. I'll check with the sheriffs.
(Reporter)

Comment 38

3 years ago
Well, let's just stick with these and see if anyone complains. I really don't see the need to run this stuff anywhere else.

Updated

3 years ago
Attachment #8401602 - Flags: review?(aki) → review+
(Reporter)

Comment 39

3 years ago
https://hg.mozilla.org/build/buildbot-configs/rev/637e30ede8a6
(Assignee)

Comment 40

3 years ago
I only want to see this running on one place and hidden until we are ready to show it visibly (by enabling it on every trunk based trees) and run it side-by-side until we switch over.
(In reply to Bill McCloskey (:billm) from comment #38)
> Well, let's just stick with these and see if anyone complains. I really
> don't see the need to run this stuff anywhere else.

You basically need it running on all trunk trees, which includes things like project branches. I would think we could take care of that by just running it on Gecko >= 31 (which then lets it ride the trains too).
(Assignee)

Comment 42

3 years ago
Live in production.
Hidden the jobs that have just started running on fx-team & b2g-inbound for now.
(Reporter)

Comment 44

3 years ago
This is working on all the branches where it's enabled. It looks like the only orange is pre-existing intermittent stuff or else is caused by patches that were later backed out. I'd like to get it unhidden.
Flags: needinfo?(armenzg)
(Assignee)

Comment 45

3 years ago
Now that we see it runnning green across the integration branches we will be enabling it for *all* Gecko 31 project branches (e.g. Cedar, Ash et all) and make sure to pin it to only trunk trees until the day that we're ready to make the switch.

https://tbpl.mozilla.org/?jobname=e10s&showall=1
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=e10s&showall=1
https://tbpl.mozilla.org/?tree=Fx-Team&jobname=e10s&showall=1
https://tbpl.mozilla.org/?tree=B2g-Inbound&jobname=e10s&showall=1
Flags: needinfo?(armenzg)
(Assignee)

Comment 46

3 years ago
Created attachment 8403337 [details] [diff] [review]
enable e10s jobs on all Gecko31 branches

I've decided that it is OK to ride the trains.
It is more complicated to find a way to pin it to *only* trunk until we decide that we're ready to switch and it prevents later on having to do uplift of patches once we have to support a new esr branch.

The load to have this enable on m-a, m-b and m-r is very very little.

billm: works for you?
Attachment #8403337 - Flags: review?
(Reporter)

Comment 47

3 years ago
I guess this is fine. Did you mean to ask someone for review?
(Assignee)

Comment 48

3 years ago
Comment on attachment 8403337 [details] [diff] [review]
enable e10s jobs on all Gecko31 branches

Yes, I did :)
Attachment #8403337 - Flags: review? → review?(aki)

Updated

3 years ago
Attachment #8403337 - Flags: review?(aki) → review+
landed in production http://hg.mozilla.org/build/buildbot-configs/rev/ad00661eae71 :)
(Assignee)

Comment 50

3 years ago
I unhid e10s across Fx-Team, B2g-Inbound, Try, m-c & m-i.

billm: I think we're done in here, right?
Flags: needinfo?(wmccloskey)
(Reporter)

Comment 51

3 years ago
Thanks Armen! I agree, this is done.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Flags: needinfo?(wmccloskey)
Resolution: --- → FIXED
Blocks: 1010285

Updated

3 years ago
Depends on: 1057301
You need to log in before you can comment on or make changes to this bug.