Closed Bug 1595515 Opened 5 years ago Closed 5 years ago

Reduce costs of running web-platform-tests scheduling by not running tests which are guaranteed to fail

Categories

(Testing :: web-platform-tests, enhancement)

enhancement
Not set
normal

Tracking

(firefox76 fixed)

RESOLVED FIXED
mozilla76
Tracking Status
firefox76 --- fixed

People

(Reporter: bc, Assigned: jgraham)

References

Details

(Whiteboard: [ci-costs-2020:done], [wptsync upstream])

Attachments

(3 files, 1 obsolete file)

web-platform-tests attempt to run every test in the manifests even if there is no chance that the test will pass. This results in about 20% of the time and money spent on running web platform tests being wasted to at least some degree.

Bug 1594796 Comment 3 mentions some drawbacks to simply turning off these tests. Coming up with a strategy that fits with the needs of the web platform tests and works for all branches is an open question.

"Guaranteed to fail" doesn't accurately characterise any suggested approach. Tests that are expected to fail may in fact pass (or error, timeout, or crash) due to code changes in Gecko, in the test, or in the harness. Tests for features we don't support may pass if they happen to be testing things that are true in the absence of any implementation (for example a test that we don't implement some historical variant of an API).

Thank you for correcting my misunderstanding. I am sure that whomever takes on this task will benefit.

I appreciate that was quite a pedantic correction, but I want to be sure that we have a common — and correct — understanding of the current situation to base decisions on.

This patch adds a --skip-status argument to web platform tests which
can be used to skip tests with the specified expected status. More
than one status can be specified by including additional --skip-status
arguments.

A corresponding --wpt-skip-status argument with the same usage is also
added to the mach try fuzzy command which can be used to set the
values for try runs.

A special value of NONE is used to clear an existing list of expected
statuses to be skipped. When a value of NONE is detected in the list
of skip statuses, all entries in the list up to and including the NONE
will be discarded leaving the subsequent values in the list.

NONE is intended to be used in try runs in order to reset the skip
list so that any skip statuses specified either in
taskcluster/ci/test/web-platform.yml or in
taskcluster/taskgraph/transforms/tests.py can be overridden by the
user.

The current version of this patch uses
taskcluster/taskgraph/transforms/tests.py to set

--skip-status=FAIL --skip-status=TIMEOUT --skip-status=ERROR

for the autoland and try repositories.

Assignee: nobody → bob

I think this approach where try is run by default with the FAIL, TIMEOUT and ERROR status skipped is not workable. If we did this as outlined in this patch then developers might submit patches which passed on try and autoland but then showed failures when run on mozilla-central. I think for this reason we should not use this approach.

Instead I think we should just set the skip status values only for autoland in the transform and allow mozilla-central and try to run the full set of tests. This would not result in the maximum possible savings but would saves 20% on autoland and would allow the tests to be run completely on mozilla-central thereby skirting any problems with wpt sync.

jgraham, ahal: I'm not looking for a full review yet but wanted to get your feed-back to see if this approach is acceptable. I expect that the final version will not set the skip status values for try.

Flags: needinfo?(james)
Flags: needinfo?(ahal)

The plan of record at the moment is to start by running tests for features we aren't implementing less frequently (or not at all), instead of using the expected status to determine when to run things. There's an action on me to start gathering the data on which tests this applies to by talking to product and engineering.

Flags: needinfo?(james)

Looking at bug 1572820 comment 7, we've got some of the plumbing in place with this patch.

--skip-implementation-status=not-implementing,backlog corresponds to the --skip-status/--wpt-skip-status though it uses a comma delimited list which does seem better than the array value i used here.

If we are going to be implementing something similar for the other frameworks, then instead of having distinct wpt only skip argument for try, we could use the same argument name for try and for the runners for wpt and the other frameworks.

The other area to consider is how to specify these arguments for the different repositories. I couldn't figure out how to do this in taskcluster/ci/test/web-platform.yml and had to resort to using the transform in taskcluster/taskgraph/transforms/tests.py. Does any one have any suggestions on how to do this using the yaml file instead? The existing extra-options made life difficult trying to shoe-horn additional extra-option processing based on by-project.

Also, if we do specify project/repo based extra-options for the mozharness command in the tree especially for try how would we allow the user to override them? Is the use of NONE as a signal to clear preceding values acceptable?

Yeah, we should definitely make sure that we run everything on try (i.e, keep the behaviour on try / central the same, whatever we end up doing). Enabling this on autoland-only seems like the right approach!

Flags: needinfo?(ahal)

Work in progress:

The following tests are supported:

  • marionette
  • mochitests
  • web-platform-tests
  • xpcshell

By default try does not have --skip-implementation-status set. Only autoland has the command line argument set in the test transform.

For demonstration purposes, I used not-implemented,backlog as possible values for implementation-statuses. The appropriate values will need to be determined.

This does not support the "-if" type tags. They appear to be best suited for the "built-in" filters and not the "Instance" filters I used. "built-in" seems to be geared towards fixed tag names and adding the conditional to the "Instance" filters would require plumbing changes to the expression parsing.

Feedback requested.

Flags: needinfo?(jmaher)
Flags: needinfo?(james)
Flags: needinfo?(ahal)

thanks for the details :bc

I would like to support -if syntax, primarily for cases where we have 'windows installer tests' and those will be not_supported on android, linux, osx. That would also support where we cannot realistically support this test running in ccov mode or some other configuration.

2 other things to consider:

  1. change to --skip-statuses (small semantic, but allows for more flexibility)
  2. ensure we can support a 'never run' and a 'periodic run' vs default. I think the plan here would do that as we could do:
  • 'never': --skip-statuses default, highly_intermittent, long_runtime
  • 'periodic': --skip-statuses default, not_supported, backlog
  • default: --skip-statuses highly_intermittent, long_runtime, not_supported, backlog
Flags: needinfo?(jmaher)

(more comments on the rest of this later)

--skip-statuses is confusing because it sounds more like result statuses. If you want a better name that isn't tied to implemenation status call it --skip-flags or something.

I've fixed a minor bug with the wpt support and confirmed that wpt's conditional manifest properties do work so we have that at least. I'll continue looking into the manifestparser filters with conditional properties.

(In reply to Bob Clary [:bc:] from comment #9)

For demonstration purposes, I used not-implemented,backlog as possible values for implementation-statuses. The appropriate values will need to be determined.

What would "backlog" mean in this case?

This does not support the "-if" type tags. They appear to be best suited for the "built-in" filters and not the "Instance" filters I used. "built-in" seems to be geared towards fixed tag names and adding the conditional to the "Instance" filters would require plumbing changes to the expression parsing.

Could you expand what you mean here? Why would you need to "support" the skip-if filter from what looks like a completely separate filter? The manifest filters are supposed to act like a chain that modifies the test objects as they go (they're kind of like early versions of the transforms system we have in taskgraph). Consumers (aka test harnesses) then add whichever filters they want prior to calling active_tests. So for example, the mochitest harness might decide to add both the skip_if filter as well as the skip_implementation_status filter (or maybe it just adds one or the other). Unless you are explicitly removing the skip_if filter, I don't understand why skip-if wouldn't be working.

Feedback requested.

What if we piggy-back off of the tagging system and implement --skip-tag? That way we can simply add a tag for whatever crazy purposes we can come up with (one of which being an implementation status). So e.g, in the manifest:

[test_foo.js]
tags = devtools not-implemented
[test_bar.js]
tags = long-runtime backlog

Then run with:

./mach mochitest --skip-tag not-implemented --skip-tag long-runtime

p.s would appreciate if you could push to phabricator for feedback as the hg.m.o diffs aren't that great. It also gives us a place to leave comments on the code.

Flags: needinfo?(ahal)

Alternatively, we could modify the existing --tag attribute to pass its value through manifestparsers expression language. So you could do things like:

./mach mochitest --tag "!not-implemented && !long-runtime"

This is less user friendly, but potentially more powerful. Ftr, I think I prefer --skip-tag.

could we support tag-if? I am thinking of the windows-installer test cases that are currently skipped on !win, maybe they have a win_only tag

Then we would need logic in the taskgraph to pass in --skip-tag win_only on non-Windows tasks.. which seems much harder than using skip-if = os == "win" :). I'd push back against tag-if unless there was a really good reason for it.

Edit: Come to think of it.. why would win_only be a conditional tag? It would need to be a constant tag anyway.

as mentioned in comment 10, there are a lot of tests that are skipped because of implementation details; that is confusing the data when trying to figure out what is skipped for other reasons (frequent/perma fail, etc.)

there is a good chance some wpt tests will be implemented on desktop but not yet on android, therefore we need some method to say this is not supported or not yet implemented on a specific platform or config.

Ah, that's a pretty good reason :). And I guess that explains what :bc meant by supporting the -if syntax. In that case, yes.. tag-if should be do-able. However, we may want to tag different things based on different conditions. E.g: not-implemented on Windows, but backlog on Android. So we can't use a single tag-if key as manifestparser doesn't allow duplicate keys (note: this same problem would apply to implementation-status-if).

So we'd need some kind of per tag expression, maybe something like:

[test_foo.js]
tags =
    devtools
    os == "android", long-running
    os == "win", not-implemented

It's definitely going to add a whole bunch of complexity though. For instance, tags currently use a space as a delimiter. We'd have to switch to something else (newlines in example above).

(Honestly, I'd recommend implementing -if in a follow-up.. whether or not we use tags or implemenation-status.. it's going to be a whole can of worms)

(In reply to Andrew Halberstadt [:ahal] from comment #13)

What would "backlog" mean in this case?

I just used it as an example from jgraham's bug 1572820 comment 3 in order to exercise the code. We need to figure out what values we need, then determine the tests which need the annotations.

This does not support the "-if" type tags. They appear to be best suited for the "built-in" filters and not the "Instance" filters I used. "built-in" seems to be geared towards fixed tag names and adding the conditional to the "Instance" filters would require plumbing changes to the expression parsing.

Could you expand what you mean here? Why would you need to "support" the skip-if filter from what looks like a completely separate filter? The manifest filters are supposed to act like a chain that modifies the test objects as they go (they're kind of like early versions of the transforms system we have in taskgraph). Consumers (aka test harnesses) then add whichever filters they want prior to calling active_tests. So for example, the mochitest harness might decide to add both the skip_if filter as well as the skip_implementation_status filter (or maybe it just adds one or the other). Unless you are explicitly removing the skip_if filter, I don't understand why skip-if wouldn't be working.

I'm not talking about replacing the built-in filters skip_if, run_if or fail_if but am talking about something like implementation-statuses-if. It would be different from the current built-in filters which require the annotation to look like

<tag>-if = <boolean expression>

We need instead something that would allow

<tag>-if = <boolean expression> value

Which isn't supported by manifestparser's expression parser.

Note that the web-platform test's manifests and manifest parser already support conditional properties with multiple values. If we want to go down this route I would suggest we consider standardizing on web platform test's version.

What if we piggy-back off of the tagging system and implement --skip-tag? That way we can simply add a tag for whatever crazy purposes we can come up with (one of which being an implementation status). So e.g, in the manifest:

[test_foo.js]
tags = devtools not-implemented
[test_bar.js]
tags = long-runtime backlog

Then run with:

./mach mochitest --skip-tag not-implemented --skip-tag long-runtime

That might cover the use-cases. I'd have to think about it to see if it supported everything we wanted. A prerequisite would be to determine how we would want to annotate tests and how we would want to use the annotations to select tests.

p.s would appreciate if you could push to phabricator for feedback as the hg.m.o diffs aren't that great. It also gives us a place to leave comments on the code.

Will do. I'm doing another try run at the moment and just noticed the problem with the xpcshell tests on android. I was planning on doing a moz-phab submit --wip without a bug number once we were further along so we could share. I'll do that in a bit.

OK, so for wpt this looks like a reasonable approach. A couple of minor feedback points:

  • I'd rather the type conversion stuff go in the manifest rather than in wpttest.py. See the prefs implementation for comparison; this should be very similar. But if we use tags there might be no changes required.
  • I think we should explicitly mark these tests as skipped in the output, just like we do for tests that are disabled. So let's just add them to the list of disabled tests. Later we could consider having some extra information e.g. in the message property to indicate why they were disabled.
Flags: needinfo?(james)

(In reply to James Graham [:jgraham] from comment #22)

  • I'd rather the type conversion stuff go in the manifest rather than in wpttest.py. See the prefs implementation for comparison; this should be very similar. But if we use tags there might be no changes required.

I don't understand. Can you be more explicit in what you mean here?

Attachment #9113287 - Attachment is obsolete: true

See how tags work, the type conversion is in:
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/manifestexpected.py#56-64
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/manifestexpected.py#274-276 (and later in the same file for "reasons").
then for wpttest.py we just read the value and handle the (little-used) reset atom
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/wpttest.py#297-309

If we just use tags directly there's no changes required ofc. but if we take a different path, then that's the closest implementation to copy.

(In reply to James Graham [:jgraham] from comment #22)

Updated the phab requests with the changes.

OK, so for wpt this looks like a reasonable approach. A couple of minor feedback points:

  • I'd rather the type conversion stuff go in the manifest rather than in wpttest.py. See the prefs implementation for comparison; this should be very similar. But if we use tags there might be no changes required.

I was talking with jmaher and I am leaning towards a small set of conditional tags instead of these implementation-statuses. That would give us conditional parsing of the manifests on web-platform-tests and the tests that use manifestparser.

  • I think we should explicitly mark these tests as skipped in the output, just like we do for tests that are disabled. So let's just add them to the list of disabled tests. Later we could consider having some extra information e.g. in the message property to indicate why they were disabled.

I already marked them as disabled and they showed up with the other tests as skipped. I also had a log message listing each test which was skipped and the reason(s) why.

What's the status here? I have enough information to nominate some wpt directories for "never run" on the basis that we have objected to the feature existing. Getting the "will implement eventually, but not yet on the roadmap" list pinned down might take longer due to the changes around product, but the former is the lowest hanging fruit anyway.

jgraham: I have https://phabricator.services.mozilla.com/D57497 which assumes we are using --skip-implementation-statuses=not-implemented,backlog. The next step is to decide which implementation statuses you want to use and I will put up a final version of the patch for review.

I think not-implementing and backlog are good (not-implemented sounds like anything that doesn't have an implementation). I didn't know if you were planning to change the implementation per comment #25; if not I'll do a review of the wpt changes.

(In reply to James Graham [:jgraham] from comment #28)

I think not-implementing and backlog are good (not-implemented sounds like anything that doesn't have an implementation). I didn't know if you were planning to change the implementation per comment #25; if not I'll do a review of the wpt changes.

(In reply to Bob Clary [:bc:] from comment #25)

I was talking with jmaher and I am leaning towards a small set of conditional tags instead of these implementation-statuses. That would give us conditional parsing of the manifests on web-platform-tests and the tests that use manifestparser.

Oh, right. Let me look into this. I'll ping you when I have any questions or something to review.

Status: NEW → ASSIGNED
Whiteboard: [ci-costs-2020:todo]

I started looking into this again and implemented not-implementing-if and backlog-if tags for manifestparser but really am at a loss on where to go with web-platform-tests. Personally, I think I did not understand the issues when I filed this bug, and it should be closed as invalid and a new one created by someone with a better understanding of the intended implementation and scope and new person assigned to carry it forward.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → INVALID

I can take this work. I had the impression your patch was basically doing the correct thing, so I'm hopeful that it isn't hard to get it over the line :)

Assignee: bob → james
Status: RESOLVED → REOPENED
Resolution: INVALID → ---

This adds an implementation-status field to the wpt metadata with 3
possible values; "not-implementing", "backlog" and "implmenting" (the
latter being the same as the default value). It also adds a
--skip-implementation-status command line argument to wptrunner that
can be used to skip running tests with the specified implementation
statuses. This is primarilly to allow gecko to avoid running tests (or
run tests less frequently) for things where we don't implement the
spec and don't plan to either ever or in the current roadmap.

Pushed by james@hoppipolla.co.uk: https://hg.mozilla.org/integration/autoland/rev/c2a6dd63c951 Add support for implementation-status in wpt metadata, r=twisniewski https://hg.mozilla.org/integration/autoland/rev/8823ff6be13b Mark not-implementing status for some features, r=annevk https://hg.mozilla.org/integration/autoland/rev/bb0d32984294 Don't run tests for things we aren't implementing in CI, r=jmaher
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/22136 for changes under testing/web-platform/tests
Whiteboard: [ci-costs-2020:todo] → [ci-costs-2020:todo], [wptsync upstream]
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla76
Upstream PR merged by moz-wptsync-bot
Whiteboard: [ci-costs-2020:todo], [wptsync upstream] → [ci-costs-2020:done], [wptsync upstream]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: