1595515 - Reduce costs of running web-platform-tests scheduling by not running tests which are guaranteed to fail

Reporter

Description

•

5 years ago

web-platform-tests attempt to run every test in the manifests even if there is no chance that the test will pass. This results in about 20% of the time and money spent on running web platform tests being wasted to at least some degree.

Bug 1594796 Comment 3 mentions some drawbacks to simply turning off these tests. Coming up with a strategy that fits with the needs of the web platform tests and works for all branches is an open question.

James Graham [:jgraham]

Assignee

Comment 1

•

5 years ago

"Guaranteed to fail" doesn't accurately characterise any suggested approach. Tests that are expected to fail may in fact pass (or error, timeout, or crash) due to code changes in Gecko, in the test, or in the harness. Tests for features we don't support may pass if they happen to be testing things that are true in the absence of any implementation (for example a test that we don't implement some historical variant of an API).

Bob Clary [:bc] (inactive)

Reporter

Comment 2

•

5 years ago

Thank you for correcting my misunderstanding. I am sure that whomever takes on this task will benefit.

James Graham [:jgraham]

Assignee

Comment 3

•

5 years ago

I appreciate that was quite a pedantic correction, but I want to be sure that we have a common — and correct — understanding of the current situation to base decisions on.

Bob Clary [:bc] (inactive)

Reporter

Comment 4

•

5 years ago

Attached file Bug 1595515 - Do not run web platform tests with expected status FAIL, TIMEOUT or ERROR on try or autoland branches (obsolete) — Details

This patch adds a --skip-status argument to web platform tests which
can be used to skip tests with the specified expected status. More
than one status can be specified by including additional --skip-status
arguments.

A corresponding --wpt-skip-status argument with the same usage is also
added to the mach try fuzzy command which can be used to set the
values for try runs.

A special value of NONE is used to clear an existing list of expected
statuses to be skipped. When a value of NONE is detected in the list
of skip statuses, all entries in the list up to and including the NONE
will be discarded leaving the subsequent values in the list.

NONE is intended to be used in try runs in order to reset the skip
list so that any skip statuses specified either in
taskcluster/ci/test/web-platform.yml or in
taskcluster/taskgraph/transforms/tests.py can be overridden by the
user.

The current version of this patch uses
taskcluster/taskgraph/transforms/tests.py to set

--skip-status=FAIL --skip-status=TIMEOUT --skip-status=ERROR

for the autoland and try repositories.

Default try run skipping tests with expected status FAIL, TIMEOUT and
ERROR which were set by the test transform:

./mach try fuzzy --query web-platform-tests

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=884c432be16d857e403f9e797c61e34d59209b59
Try run not skipping any any tests

./mach try fuzzy --query web-platform-tests --wpt-skip-status NONE

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=ab3b1f40c5f6160f11d3d724b8552e5e521c12dc

Phabricator Automation

Updated

•

5 years ago

Assignee: nobody → bob

Bob Clary [:bc] (inactive)

Reporter

Comment 5

•

5 years ago

I think this approach where try is run by default with the FAIL, TIMEOUT and ERROR status skipped is not workable. If we did this as outlined in this patch then developers might submit patches which passed on try and autoland but then showed failures when run on mozilla-central. I think for this reason we should not use this approach.

Instead I think we should just set the skip status values only for autoland in the transform and allow mozilla-central and try to run the full set of tests. This would not result in the maximum possible savings but would saves 20% on autoland and would allow the tests to be run completely on mozilla-central thereby skirting any problems with wpt sync.

jgraham, ahal: I'm not looking for a full review yet but wanted to get your feed-back to see if this approach is acceptable. I expect that the final version will not set the skip status values for try.

Flags: needinfo?(james)

Flags: needinfo?(ahal)

James Graham [:jgraham]

Assignee

Comment 6

•

5 years ago

The plan of record at the moment is to start by running tests for features we aren't implementing less frequently (or not at all), instead of using the expected status to determine when to run things. There's an action on me to start gathering the data on which tests this applies to by talking to product and engineering.

Flags: needinfo?(james)

Bob Clary [:bc] (inactive)

Reporter

Comment 7

•

5 years ago

Looking at bug 1572820 comment 7, we've got some of the plumbing in place with this patch.

--skip-implementation-status=not-implementing,backlog corresponds to the --skip-status/--wpt-skip-status though it uses a comma delimited list which does seem better than the array value i used here.

If we are going to be implementing something similar for the other frameworks, then instead of having distinct wpt only skip argument for try, we could use the same argument name for try and for the runners for wpt and the other frameworks.

The other area to consider is how to specify these arguments for the different repositories. I couldn't figure out how to do this in taskcluster/ci/test/web-platform.yml and had to resort to using the transform in taskcluster/taskgraph/transforms/tests.py. Does any one have any suggestions on how to do this using the yaml file instead? The existing extra-options made life difficult trying to shoe-horn additional extra-option processing based on by-project.

Also, if we do specify project/repo based extra-options for the mozharness command in the tree especially for try how would we allow the user to override them? Is the use of NONE as a signal to clear preceding values acceptable?

Andrew Halberstadt [:ahal]

Comment 8

•

5 years ago

Yeah, we should definitely make sure that we run everything on try (i.e, keep the behaviour on try / central the same, whatever we end up doing). Enabling this on autoland-only seems like the right approach!

Flags: needinfo?(ahal)

Bob Clary [:bc] (inactive)

Reporter

Comment 9

•

5 years ago

Work in progress:

The following tests are supported:

marionette
mochitests
web-platform-tests
xpcshell

By default try does not have --skip-implementation-status set. Only autoland has the command line argument set in the test transform.

For demonstration purposes, I used not-implemented,backlog as possible values for implementation-statuses. The appropriate values will need to be determined.

This does not support the "-if" type tags. They appear to be best suited for the "built-in" filters and not the "Instance" filters I used. "built-in" seems to be geared towards fixed tag names and adding the conditional to the "Instance" filters would require plumbing changes to the expression parsing.

./mach try fuzzy --query 'xpcshell | mochitest | marionette | web-platform-tests' --skip-implementation-statuses not-implemented,backlog

Looks ok and does skip the appropriate tests.
./mach try fuzzy --query 'reftest | crashtest | jsreftest -1$' --skip-implementation-statuses not-implemented,backlog --artifact
TODO: Though only the "supported" frameworks have the --skip-implementation-statuses set in taskcluster/taskgraph/transforms/tests.py setup_skip_implementation_statuses, the reftests were still called with the unsupported command line argument. I must be missing some point here.

Feedback requested.

Flags: needinfo?(jmaher)

Flags: needinfo?(james)

Flags: needinfo?(ahal)

Joel Maher ( :jmaher ) (UTC -8)

Comment 10

•

5 years ago

thanks for the details :bc

I would like to support -if syntax, primarily for cases where we have 'windows installer tests' and those will be not_supported on android, linux, osx. That would also support where we cannot realistically support this test running in ccov mode or some other configuration.

2 other things to consider:

change to --skip-statuses (small semantic, but allows for more flexibility)
ensure we can support a 'never run' and a 'periodic run' vs default. I think the plan here would do that as we could do:

'never': --skip-statuses default, highly_intermittent, long_runtime
'periodic': --skip-statuses default, not_supported, backlog
default: --skip-statuses highly_intermittent, long_runtime, not_supported, backlog

Flags: needinfo?(jmaher)

James Graham [:jgraham]

Assignee

Comment 11

•

5 years ago

(more comments on the rest of this later)

--skip-statuses is confusing because it sounds more like result statuses. If you want a better name that isn't tied to implemenation status call it --skip-flags or something.

Bob Clary [:bc] (inactive)

Reporter

Comment 12

•

5 years ago

I've fixed a minor bug with the wpt support and confirmed that wpt's conditional manifest properties do work so we have that at least. I'll continue looking into the manifestparser filters with conditional properties.

Andrew Halberstadt [:ahal]

Comment 13

•

5 years ago

•

Edited

(In reply to Bob Clary [:bc:] from comment #9)

For demonstration purposes, I used not-implemented,backlog as possible values for implementation-statuses. The appropriate values will need to be determined.

What would "backlog" mean in this case?

This does not support the "-if" type tags. They appear to be best suited for the "built-in" filters and not the "Instance" filters I used. "built-in" seems to be geared towards fixed tag names and adding the conditional to the "Instance" filters would require plumbing changes to the expression parsing.

Could you expand what you mean here? Why would you need to "support" the skip-if filter from what looks like a completely separate filter? The manifest filters are supposed to act like a chain that modifies the test objects as they go (they're kind of like early versions of the transforms system we have in taskgraph). Consumers (aka test harnesses) then add whichever filters they want prior to calling active_tests. So for example, the mochitest harness might decide to add both the skip_if filter as well as the skip_implementation_status filter (or maybe it just adds one or the other). Unless you are explicitly removing the skip_if filter, I don't understand why skip-if wouldn't be working.

Feedback requested.

What if we piggy-back off of the tagging system and implement --skip-tag? That way we can simply add a tag for whatever crazy purposes we can come up with (one of which being an implementation status). So e.g, in the manifest:

[test_foo.js]
tags = devtools not-implemented
[test_bar.js]
tags = long-runtime backlog

Then run with:

./mach mochitest --skip-tag not-implemented --skip-tag long-runtime

p.s would appreciate if you could push to phabricator for feedback as the hg.m.o diffs aren't that great. It also gives us a place to leave comments on the code.

Flags: needinfo?(ahal)

Andrew Halberstadt [:ahal]

Comment 14

•

5 years ago

Alternatively, we could modify the existing --tag attribute to pass its value through manifestparsers expression language. So you could do things like:

./mach mochitest --tag "!not-implemented && !long-runtime"

This is less user friendly, but potentially more powerful. Ftr, I think I prefer --skip-tag.

Joel Maher ( :jmaher ) (UTC -8)

Comment 15

•

5 years ago

could we support tag-if? I am thinking of the windows-installer test cases that are currently skipped on !win, maybe they have a win_only tag

Andrew Halberstadt [:ahal]

Comment 16

•

5 years ago

•

Edited

Then we would need logic in the taskgraph to pass in --skip-tag win_only on non-Windows tasks.. which seems much harder than using skip-if = os == "win" :). I'd push back against tag-if unless there was a really good reason for it.

Edit: Come to think of it.. why would win_only be a conditional tag? It would need to be a constant tag anyway.

Joel Maher ( :jmaher ) (UTC -8)

Comment 17

•

5 years ago

as mentioned in comment 10, there are a lot of tests that are skipped because of implementation details; that is confusing the data when trying to figure out what is skipped for other reasons (frequent/perma fail, etc.)

there is a good chance some wpt tests will be implemented on desktop but not yet on android, therefore we need some method to say this is not supported or not yet implemented on a specific platform or config.

Andrew Halberstadt [:ahal]

Comment 18

•

5 years ago

Ah, that's a pretty good reason :). And I guess that explains what :bc meant by supporting the -if syntax. In that case, yes.. tag-if should be do-able. However, we may want to tag different things based on different conditions. E.g: not-implemented on Windows, but backlog on Android. So we can't use a single tag-if key as manifestparser doesn't allow duplicate keys (note: this same problem would apply to implementation-status-if).

So we'd need some kind of per tag expression, maybe something like:

[test_foo.js]
tags =
    devtools
    os == "android", long-running
    os == "win", not-implemented

It's definitely going to add a whole bunch of complexity though. For instance, tags currently use a space as a delimiter. We'd have to switch to something else (newlines in example above).

Andrew Halberstadt [:ahal]

Comment 19

•

5 years ago

(Honestly, I'd recommend implementing -if in a follow-up.. whether or not we use tags or implemenation-status.. it's going to be a whole can of worms)

Bob Clary [:bc] (inactive)

Reporter

Comment 20

•

5 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #13)

What would "backlog" mean in this case?

I just used it as an example from jgraham's bug 1572820 comment 3 in order to exercise the code. We need to figure out what values we need, then determine the tests which need the annotations.

This does not support the "-if" type tags. They appear to be best suited for the "built-in" filters and not the "Instance" filters I used. "built-in" seems to be geared towards fixed tag names and adding the conditional to the "Instance" filters would require plumbing changes to the expression parsing.

Could you expand what you mean here? Why would you need to "support" the skip-if filter from what looks like a completely separate filter? The manifest filters are supposed to act like a chain that modifies the test objects as they go (they're kind of like early versions of the transforms system we have in taskgraph). Consumers (aka test harnesses) then add whichever filters they want prior to calling active_tests. So for example, the mochitest harness might decide to add both the skip_if filter as well as the skip_implementation_status filter (or maybe it just adds one or the other). Unless you are explicitly removing the skip_if filter, I don't understand why skip-if wouldn't be working.

I'm not talking about replacing the built-in filters skip_if, run_if or fail_if but am talking about something like implementation-statuses-if. It would be different from the current built-in filters which require the annotation to look like

<tag>-if = <boolean expression>

We need instead something that would allow

<tag>-if = <boolean expression> value

Which isn't supported by manifestparser's expression parser.

Note that the web-platform test's manifests and manifest parser already support conditional properties with multiple values. If we want to go down this route I would suggest we consider standardizing on web platform test's version.

What if we piggy-back off of the tagging system and implement --skip-tag? That way we can simply add a tag for whatever crazy purposes we can come up with (one of which being an implementation status). So e.g, in the manifest:
[test_foo.js]
tags = devtools not-implemented
[test_bar.js]
tags = long-runtime backlog
Then run with:
./mach mochitest --skip-tag not-implemented --skip-tag long-runtime

That might cover the use-cases. I'd have to think about it to see if it supported everything we wanted. A prerequisite would be to determine how we would want to annotate tests and how we would want to use the annotations to select tests.

p.s would appreciate if you could push to phabricator for feedback as the hg.m.o diffs aren't that great. It also gives us a place to leave comments on the code.

Will do. I'm doing another try run at the moment and just noticed the problem with the xpcshell tests on android. I was planning on doing a moz-phab submit --wip without a bug number once we were further along so we could share. I'll do that in a bit.

Bob Clary [:bc] (inactive)

Reporter

Comment 21

•

5 years ago

try

Phab - Prototype using implementation-statuses to determine which tests should be skipped on autoland branches.

Phab - Set demo implementation statuses.

James Graham [:jgraham]

Assignee

Comment 22

•

5 years ago

OK, so for wpt this looks like a reasonable approach. A couple of minor feedback points:

I'd rather the type conversion stuff go in the manifest rather than in wpttest.py. See the prefs implementation for comparison; this should be very similar. But if we use tags there might be no changes required.
I think we should explicitly mark these tests as skipped in the output, just like we do for tests that are disabled. So let's just add them to the list of disabled tests. Later we could consider having some extra information e.g. in the message property to indicate why they were disabled.

Flags: needinfo?(james)

Bob Clary [:bc] (inactive)

Reporter

Comment 23

•

5 years ago

(In reply to James Graham [:jgraham] from comment #22)

I'd rather the type conversion stuff go in the manifest rather than in wpttest.py. See the prefs implementation for comparison; this should be very similar. But if we use tags there might be no changes required.

I don't understand. Can you be more explicit in what you mean here?

Bob Clary [:bc] (inactive)

Reporter

Updated

•

5 years ago

Attachment #9113287 - Attachment is obsolete: true

James Graham [:jgraham]

Assignee

Comment 24

•

5 years ago

See how tags work, the type conversion is in:
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/manifestexpected.py#56-64
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/manifestexpected.py#274-276 (and later in the same file for "reasons").
then for wpttest.py we just read the value and handle the (little-used) reset atom
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/wpttest.py#297-309

If we just use tags directly there's no changes required ofc. but if we take a different path, then that's the closest implementation to copy.

Bob Clary [:bc] (inactive)

Reporter

Comment 25

•

5 years ago

(In reply to James Graham [:jgraham] from comment #22)

Updated the phab requests with the changes.

OK, so for wpt this looks like a reasonable approach. A couple of minor feedback points:

I'd rather the type conversion stuff go in the manifest rather than in wpttest.py. See the prefs implementation for comparison; this should be very similar. But if we use tags there might be no changes required.

I was talking with jmaher and I am leaning towards a small set of conditional tags instead of these implementation-statuses. That would give us conditional parsing of the manifests on web-platform-tests and the tests that use manifestparser.

I think we should explicitly mark these tests as skipped in the output, just like we do for tests that are disabled. So let's just add them to the list of disabled tests. Later we could consider having some extra information e.g. in the message property to indicate why they were disabled.

I already marked them as disabled and they showed up with the other tests as skipped. I also had a log message listing each test which was skipped and the reason(s) why.

James Graham [:jgraham]

Assignee

Comment 26

•

5 years ago

What's the status here? I have enough information to nominate some wpt directories for "never run" on the basis that we have objected to the feature existing. Getting the "will implement eventually, but not yet on the roadmap" list pinned down might take longer due to the changes around product, but the former is the lowest hanging fruit anyway.

Bob Clary [:bc] (inactive)

Reporter

Comment 27

•

5 years ago

jgraham: I have https://phabricator.services.mozilla.com/D57497 which assumes we are using --skip-implementation-statuses=not-implemented,backlog. The next step is to decide which implementation statuses you want to use and I will put up a final version of the patch for review.

James Graham [:jgraham]

Assignee

Comment 28

•

5 years ago

I think not-implementing and backlog are good (not-implemented sounds like anything that doesn't have an implementation). I didn't know if you were planning to change the implementation per comment #25; if not I'll do a review of the wpt changes.

Bob Clary [:bc] (inactive)

Reporter

Comment 29

•

5 years ago

(In reply to James Graham [:jgraham] from comment #28)

I think not-implementing and backlog are good (not-implemented sounds like anything that doesn't have an implementation). I didn't know if you were planning to change the implementation per comment #25; if not I'll do a review of the wpt changes.

(In reply to Bob Clary [:bc:] from comment #25)

I was talking with jmaher and I am leaning towards a small set of conditional tags instead of these implementation-statuses. That would give us conditional parsing of the manifests on web-platform-tests and the tests that use manifestparser.

Oh, right. Let me look into this. I'll ping you when I have any questions or something to review.

Bob Clary [:bc] (inactive)

Reporter

Updated

•

5 years ago

Status: NEW → ASSIGNED

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

5 years ago

Whiteboard: [ci-costs-2020:todo]

Bob Clary [:bc] (inactive)

Reporter

Comment 30

•

5 years ago

I started looking into this again and implemented not-implementing-if and backlog-if tags for manifestparser but really am at a loss on where to go with web-platform-tests. Personally, I think I did not understand the issues when I filed this bug, and it should be closed as invalid and a new one created by someone with a better understanding of the intended implementation and scope and new person assigned to carry it forward.

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

Resolution: --- → INVALID

James Graham [:jgraham]

Assignee

Comment 31

•

5 years ago

I can take this work. I had the impression your patch was basically doing the correct thing, so I'm hopeful that it isn't hard to get it over the line :)

James Graham [:jgraham]

Assignee

Updated

•

5 years ago

Assignee: bob → james

Status: RESOLVED → REOPENED

Resolution: INVALID → ---

James Graham [:jgraham]

Assignee

Comment 32

•

5 years ago

Attached file Bug 1595515 - Add support for implementation-status in wpt metadata, — Details

This adds an implementation-status field to the wpt metadata with 3
possible values; "not-implementing", "backlog" and "implmenting" (the
latter being the same as the default value). It also adds a
--skip-implementation-status command line argument to wptrunner that
can be used to skip running tests with the specified implementation
statuses. This is primarilly to allow gecko to avoid running tests (or
run tests less frequently) for things where we don't implement the
spec and don't plan to either ever or in the current roadmap.

James Graham [:jgraham]

Assignee

Comment 33

•

5 years ago

Attached file Bug 1595515 - Mark not-implementing status for some features, — Details

James Graham [:jgraham]

Assignee

Comment 34

•

5 years ago

Attached file Bug 1595515 - Don't run tests for things we aren't implementing in CI, — Details

Pulsebot

Comment 35

•

5 years ago

Pushed by james@hoppipolla.co.uk: https://hg.mozilla.org/integration/autoland/rev/c2a6dd63c951 Add support for implementation-status in wpt metadata, r=twisniewski https://hg.mozilla.org/integration/autoland/rev/8823ff6be13b Mark not-implementing status for some features, r=annevk https://hg.mozilla.org/integration/autoland/rev/bb0d32984294 Don't run tests for things we aren't implementing in CI, r=jmaher

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 36

•

5 years ago

Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/22136 for changes under testing/web-platform/tests

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Updated

•

5 years ago

Whiteboard: [ci-costs-2020:todo] → [ci-costs-2020:todo], [wptsync upstream]

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 37

•

5 years ago

Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.

Oana Pop-Rus

Comment 38

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/c2a6dd63c951
https://hg.mozilla.org/mozilla-central/rev/8823ff6be13b
https://hg.mozilla.org/mozilla-central/rev/bb0d32984294

Status: REOPENED → RESOLVED

Closed: 5 years ago → 5 years ago

status-firefox76: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla76

Web Platform Test Sync Bot [:wpt-sync] (Matrix: #interop:mozilla.org)

Comment 39

•

5 years ago

Upstream PR merged by moz-wptsync-bot

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

5 years ago

Whiteboard: [ci-costs-2020:todo], [wptsync upstream] → [ci-costs-2020:done], [wptsync upstream]

Bug 1595515 - Do not run web platform tests with expected status FAIL, TIMEOUT or ERROR on try or autoland branches 5 years ago Bob Clary [:bc] (inactive) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1595515 - Add support for implementation-status in wpt metadata, 5 years ago James Graham [:jgraham] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1595515 - Mark not-implementing status for some features, 5 years ago James Graham [:jgraham] 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1595515 - Don't run tests for things we aren't implementing in CI, 5 years ago James Graham [:jgraham] 47 bytes, text/x-phabricator-request		Details \| Review