1318187 - Ensure that we actually test Stylo's parallel traversal on CI: enable parallel traversal in e10s tests, sequential traversal in non-e10s tests

Reporter

Description

•

8 years ago

I was just looking at a crash dump from linux64 try, and noticed that it was using sequential traversal. This is presumably because the VM configuration causes our heuristics to decide that there's no point to parallel traversal. We should investigate what's going on, and see whether we can either improve our VM configuration or artificially force parallel traversal. Not really urgent but something to do before shipping.

Bobby Holley (:bholley)

Reporter

Updated

•

8 years ago

Blocks: stylo-tooling
No longer blocks: stylo

Summary: stylo: Ensure that we actually test parallel traversal on CI → Ensure that we actually test the parallel stylo traversal on CI

Whiteboard: [Stylo]

Xidorn Quan [:xidorn] UTC+10

Comment 1

•

8 years ago

I guess we may want to test both parallel and sequential traversal. Should we have an additional set of tasks to run tests in the other way?

Bobby Holley (:bholley)

Reporter

Comment 2

•

8 years ago

Bug 1347399 indicates that this is getting to be a more urgent problem. (In reply to Xidorn Quan [:xidorn] (UTC+10) from comment #1) > I guess we may want to test both parallel and sequential traversal. Should > we have an additional set of tasks to run tests in the other way? It would be nice to do that in a way that doesn't increase our automation load. A reasonable compromise might be to run the parallel traversal for the e10s tests, and the sequential traversal for the non-e10s tests. This basically means passing different environmental variables to the two test configurations - STYLO_THREADS=1 and STYLO_THREADS=4. jgriffin, can you find someone to help us out with this?

Flags: needinfo?(jgriffin)

Bobby Holley (:bholley)

Reporter

Comment 3

•

8 years ago

Redirecting to ted since jgriffin is out.

Flags: needinfo?(jgriffin) → needinfo?(ted)

Bobby Holley (:bholley)

Reporter

Comment 4

•

8 years ago

Note: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8e19102a1ba20356ad368935ae93619193fb3137 is a green try run with a patch to force parallel traversal (and another to fix bug 1347399). We should flip this on now to be sure it stays green.

Bobby Holley (:bholley)

Reporter

Comment 5

•

8 years ago

Comment 6

•

8 years ago

So one thing that came up in IRC discussion yesterday is that the taskcluster workers are configured to use a single core, because when jobs were being migrated from buildbot that matched the configuration of those workers, and we were seeing lots of extra intermittent failures when running with extra cores. If we want to revisit that decision for Stylo tests I think that'd be fine, but it might require some work to make things green.

Flags: needinfo?(ted)

Chris Peterson [:cpeterson]

Comment 7

•

8 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #6) > If we want to revisit that decision for Stylo tests I think that'd be fine, > but it might require some work to make things green. Ted, did these intermittent test failures on multi-core taskcluster workers affect all sorts of tests? Fixing other teams' intermittent test failures does not sound like something the Stylo team wants to take on. :) Can we limit the tests that run on multi-core workers just to the reftests relevant to style? We might get some testing value by enabling STYLO_THREADS=4 even on single-core workers until we can fix those multi-core test failures.

Flags: needinfo?(ted)

Summary: Ensure that we actually test the parallel stylo traversal on CI → Ensure that we actually test the parallel stylo traversal on CI: enable parallel traversal in e10s tests, sequential traversal in non-e10s tests

Chris Peterson [:cpeterson]

Updated

•

8 years ago

Priority: -- → P2

Summary: Ensure that we actually test the parallel stylo traversal on CI: enable parallel traversal in e10s tests, sequential traversal in non-e10s tests → stylo: Ensure that we actually test Stylo's parallel traversal on CI: enable parallel traversal in e10s tests, sequential traversal in non-e10s tests

Bobby Holley (:bholley)

Reporter

Comment 8

•

8 years ago

To be clear, I'm not asking for multicore testing (at least not yet). I just want to enabled parallel traversal on the existing single-core testers. That will tell us when we're e.g. tripping an NS_IsMainThread() assertion on an FFI call. So my ask in this bug is just to get help setting up an environmental variable in the appropriate configurations. Should be straightforward for someone familiar with that stuff.

Nathan Froyd [:froydnj]

Assignee

Comment 9

•

8 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #8) > To be clear, I'm not asking for multicore testing (at least not yet). I just > want to enabled parallel traversal on the existing single-core testers. That > will tell us when we're e.g. tripping an NS_IsMainThread() assertion on an > FFI call. > > So my ask in this bug is just to get help setting up an environmental > variable in the appropriate configurations. Should be straightforward for > someone familiar with that stuff. This is straightforward, and I can look into doing this. Reftests only, or mochitests too? Do we care if flipping this makes the test jobs fall over, or fall over harder than they were already doing (i.e. assertions and whatnot)?

Flags: needinfo?(bobbyholley)

Bobby Holley (:bholley)

Reporter

Comment 10

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #9) > (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #8) > > To be clear, I'm not asking for multicore testing (at least not yet). I just > > want to enabled parallel traversal on the existing single-core testers. That > > will tell us when we're e.g. tripping an NS_IsMainThread() assertion on an > > FFI call. > > > > So my ask in this bug is just to get help setting up an environmental > > variable in the appropriate configurations. Should be straightforward for > > someone familiar with that stuff. > > This is straightforward, and I can look into doing this. Thank you! > Reftests only, or > mochitests too? Both, ideally. > > Do we care if flipping this makes the test jobs fall over, or fall over > harder than they were already doing (i.e. assertions and whatnot)? I tested a few weeks ago and we were green. However, I think bug 1351200 is currently going to make everything crash. I'll try to get that accelerated, but in the mean time let's get this patch ready to go so we can land it a soon as we're green.

Flags: needinfo?(ted)

Flags: needinfo?(bobbyholley)

Chris Peterson [:cpeterson]

Comment 11

•

8 years ago

Nathan said he would enable parallel traversal for Stylo's e10s tests. We still want sequential traversal for Stylo's non-e10s tests so we are testing both parallel and sequential code paths.

Assignee: nobody → nfroyd

Chris Peterson [:cpeterson]

Updated

•

8 years ago

Summary: stylo: Ensure that we actually test Stylo's parallel traversal on CI: enable parallel traversal in e10s tests, sequential traversal in non-e10s tests → Ensure that we actually test Stylo's parallel traversal on CI: enable parallel traversal in e10s tests, sequential traversal in non-e10s tests

Nathan Froyd [:froydnj]

Assignee

Comment 12

•

8 years ago

OK, so the test definitions for stylo's test jobs live here: http://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/test/tests.yml#913 (mochitest, e10s and non-e10s) http://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/test/tests.yml#935 (mochitest-chrome, non-e10s only (?)) http://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/test/tests.yml#1050 (reftest, e10s only, bug 1343301, bug 1339604) The comments for the reftest disabling suggest that we don't care about the non-e10 case because those tests are going away for 57 anyway...but we still care about the non-e10s case in the here and now for everything else? For reftest, we can pass in an option to set the STYLO_THREADS environment variable appropriately always--assuming that we're not going to turn on non-e10s there. For mochitest-chrome, the right thing already happens, apparently...or we adopt the same solution for mochitest-chrome, see below. For mochitest...I see two options: 1) We duplicate the definition in the .yml file, one for non-e10s and one for e10s; the e10s one then gets whatever appropriate options we need. 2) We write a transform to check whether the test job is for stylo and has e10s enabled; if so, we add whatever we need to turn on the appropriate options. The second solution is obviously a bit more elegant and general, but does depend on the transforms being run in a particular order (which may already be guaranteed). Dustin, does this solution sound like the right way to approach things?

Flags: needinfo?(dustin)

Dustin J. Mitchell [:dustin] (he/him)

Comment 13

•

8 years ago

I only understand about half of that, but what I've got is "e10s and non-e10s for these jobs are substantially different". In that case, I think your option (1) makes more sense and will give you the flexibility and readability you need.

Flags: needinfo?(dustin)

Nathan Froyd [:froydnj]

Assignee

Comment 14

•

8 years ago

(In reply to Dustin J. Mitchell [:dustin] from comment #13) > I only understand about half of that, but what I've got is "e10s and > non-e10s for these jobs are substantially different". In that case, I think > your option (1) makes more sense and will give you the flexibility and > readability you need. OK. Does knowing that "substantially different" means "whether we pass a command-line option to the mochitest test runner" change your mind? I'm totally willing to cut-and-paste here, just want to make sure we're on the same page before I do that.

Flags: needinfo?(dustin)

Dustin J. Mitchell [:dustin] (he/him)

Comment 15

•

8 years ago

Oh, no, that doesn't seem so bad. Thanks for the clarification. In that case, I think (2) is the right solution, and you'd just need to make sure your transform occurs after "e10s: both" is split into e10s: true and e10s: false. Transforms within a single file run in order, and the order of files is given in taskcluster/ci/tests/kind.yml.

Flags: needinfo?(dustin)

Nathan Froyd [:froydnj]

Assignee

Updated

•

8 years ago

Depends on: 1351200

Nathan Froyd [:froydnj]

Assignee

Comment 16

•

8 years ago

Attached patch turn on parallel Stylo traversal for e10s tests — Details — Splinter Review

We'd like to ensure that both parallel and serial traversal in Stylo are tested on automation. Since e10s is the future, we've chosen to force parallel traversal on during e10s tests, and force serial traversal on during non-e10s tests. You can see it working sort-of-as-intended here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e477c06323a0981f88649605ad270ea012d0f68e The e10s tests are falling over due to 1351200, so we'll have to wait to land this until at least that bug is fixed. Dustin for the taskcluster transform bit, Chris for the mozharness change.

Attachment #8855316 - Flags: review?(dustin)

Attachment #8855316 - Flags: review?(cmanchester)

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

8 years ago

Attachment #8855316 - Flags: review?(dustin) → review+

Chris Manchester (limited bugmail, email directly)

Comment 17

•

8 years ago

Comment on attachment 8855316 [details] [diff] [review] turn on parallel Stylo traversal for e10s tests Review of attachment 8855316 [details] [diff] [review]: ----------------------------------------------------------------- Can't we set environment variables directly from TaskCluster? It looks like we can from taskcluster/taskgraph/transforms/job/mozharness_test.py

Attachment #8855316 - Flags: review?(cmanchester)

Nathan Froyd [:froydnj]

Assignee

Comment 18

•

8 years ago

(In reply to Chris Manchester (:chmanchester) from comment #17) > Can't we set environment variables directly from TaskCluster? It looks like > we can from taskcluster/taskgraph/transforms/job/mozharness_test.py We could do that, but we'd have to implement the environment variable in three different places for the (currently supported) three different worker types. Admittedly, we're only running on Linux atm, so we only need it for one worker type, but presumably we'll start testing on more platforms at some point, and it'd be nice to not have to remember to go back and fix the other workers at that point. WDYT?

Flags: needinfo?(cmanchester)

Chris Manchester (limited bugmail, email directly)

Comment 19

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #18) > (In reply to Chris Manchester (:chmanchester) from comment #17) > > Can't we set environment variables directly from TaskCluster? It looks like > > we can from taskcluster/taskgraph/transforms/job/mozharness_test.py > > We could do that, but we'd have to implement the environment variable in > three different places for the (currently supported) three different worker > types. Admittedly, we're only running on Linux atm, so we only need it for > one worker type, but presumably we'll start testing on more platforms at > some point, and it'd be nice to not have to remember to go back and fix the > other workers at that point. WDYT? Someone may correct me on this, but reading the first patch in bug 1343327 makes me think we should be setting these as environment variables directly from TC. Given the sort of things I'm seeing in that file it rather seems like this will end up being de-duplicated as a matter of bringing more platforms on line. Let's check in with someone more involved with the migration.

Flags: needinfo?(cmanchester)

Nathan Froyd [:froydnj]

Assignee

Updated

•

8 years ago

Depends on: 1355097

Nathan Froyd [:froydnj]

Assignee

Comment 20

•

8 years ago

(In reply to Chris Manchester (:chmanchester) from comment #19) > (In reply to Nathan Froyd [:froydnj] from comment #18) > > (In reply to Chris Manchester (:chmanchester) from comment #17) > > > Can't we set environment variables directly from TaskCluster? It looks like > > > we can from taskcluster/taskgraph/transforms/job/mozharness_test.py > > > > We could do that, but we'd have to implement the environment variable in > > three different places for the (currently supported) three different worker > > types. Admittedly, we're only running on Linux atm, so we only need it for > > one worker type, but presumably we'll start testing on more platforms at > > some point, and it'd be nice to not have to remember to go back and fix the > > other workers at that point. WDYT? > > Someone may correct me on this, but reading the first patch in bug 1343327 > makes me think we should be setting these as environment variables directly > from TC. Given the sort of things I'm seeing in that file it rather seems > like this will end up being de-duplicated as a matter of bringing more > platforms on line. Let's check in with someone more involved with the > migration. Do you have someone in mind? I asked in #taskcluster: 9:37 AM <froydnj> is it preferred to set environment variables in taskcluster code directly, or to set them in mozharness or whatever framework is controlling the job? 9:39 AM <@dustin> froydnj: depends on the var, but generally in mozharness 9:39 AM <@dustin> froydnj: especially for gecko tests 9:40 AM <froydnj> dustin: ok, that's what I thought. thanks which would support the command-line-option approach taken in this patch; the approach taken in this bug is also the approach taken by things like the --allow-software-gl-layers mozharness option. My (limited) understanding is that Stylo is also going to be basically x86-64 Linux testing only until it gets turned on in Nightly, at which point there's not going to be a lot of test bringup on other platforms prior to that, so it'd be *really* nice if things just magically worked when we flipped the mozconfig switch for Stylo on Windows and Mac, rather than having to remember to twiddle things here.

Flags: needinfo?(cmanchester)

Nathan Froyd [:froydnj]

Assignee

Updated

•

8 years ago

Depends on: 1354772

Chris Manchester (limited bugmail, email directly)

Comment 21

•

8 years ago

I guess that's fine then. Obviously something's going to need to be twiddled when we want this to work outside linux, this patch is only checking against "linux64-stylo/" as is.

Flags: needinfo?(cmanchester)

Chris Manchester (limited bugmail, email directly)

Updated

•

8 years ago

Attachment #8855316 - Flags: review+

Bobby Holley (:bholley)

Reporter

Comment 22

•

8 years ago

All the deps are fixed - is this green now?

Flags: needinfo?(nfroyd)

Nathan Froyd [:froydnj]

Assignee

Comment 23

•

8 years ago

(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #22) > All the deps are fixed - is this green now? Nope, filed bug 1355814, bug 1355813, and bug 1355811 for issues that have come up.

Depends on: 1355811, 1355813, 1355814

Flags: needinfo?(nfroyd)

Bobby Holley (:bholley)

Reporter

Comment 24

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #23) > (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #22) > > All the deps are fixed - is this green now? > > Nope, filed bug 1355814, bug 1355813, and bug 1355811 for issues that have > come up. These are all from Manish's font stuff - he's going to dig into them today.

Bobby Holley (:bholley)

Reporter

Comment 25

•

8 years ago

Manish is going to disable the font metric stuff for now to unblock this.

Flags: needinfo?(manishearth)

Chris Peterson [:cpeterson]

Updated

•

8 years ago

Depends on: 1356122

Manish Goregaokar [:manishearth]

Comment 26

•

8 years ago

Disabled

Flags: needinfo?(manishearth)

Manish Goregaokar [:manishearth]

Comment 27

•

8 years ago

Try push off autoland (with metrics disabled and || traversal enabled) https://treeherder.mozilla.org/#/jobs?repo=try&revision=a6ac812ffbdaf3851b3d9099d0028e5cba48cfb1

Chris Peterson [:cpeterson]

Comment 28

•

8 years ago

Nathan, fyi: the Stylo team is now asking for a new "linux64-stylo-sequential" platform (bug 1356122) that will run the Stylo tests in sequential mode on central and try. All non-e10s tests (not just Stylo's) are going away in 57 and we need to continue running both parallel and sequential Stylo modes. As long as we have the non-e10s tests running on autoland and inbound for 55-57, it would be nice to be able to run them in sequential mode. linux64-stylo-sequential would only run tests on central, while sequential non-e10s could catch regressions sooner on autoland or inbound.

Flags: needinfo?(nfroyd)

Chris Peterson [:cpeterson]

Updated

•

8 years ago

Flags: needinfo?(nfroyd)

Pulsebot

Comment 29

•

8 years ago

Pushed by bholley@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b43d7c9d5a5a turn on parallel Stylo traversal for e10s tests; r=dustin,chmanchester

Iris Hsiao [:ihsiao]

Comment 30

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b43d7c9d5a5a

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox55: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla55