1586790 - (WPT-Fis) [meta] Make web-platform-tests work with Fission enabled

Sure. We still need to enable WPTs for some missing Fission configurations (such as linux1804-64-asan/opt and windows10-64-qr/debug), but I will file new bugs to verify and enable those tests.

Status: NEW → RESOLVED

Closed: 3 years ago

Flags: needinfo?(cpeterson)

Resolution: --- → FIXED

Neha Kochar [:neha]

Comment 13

•

3 years ago

We should keep this meta open and use it to track the remaining WPTs that are disabled for Fission. We'll need new bugs for those under this meta.

jgraham, can we re-enable all currently disabled WPTs for Fission, and then let the wpt-sync script disable the still failing ones, so that we don't have to individually check each intermittent to see if it can be re-enabled? Can the wpt-sync script or an adjacent script file a new bug under this meta bug for each Fission WPT that gets disabled so we have a consolidated list? Filing a new bug for each Fission disabled WPT should continue for each wpt-sync run so we don't miss tests that get disabled for Fission.

Status: RESOLVED → REOPENED

Flags: needinfo?(james)

Resolution: FIXED → ---

Neha Kochar [:neha]

Updated

•

3 years ago

Depends on: 1694297

James Graham [:jgraham]

Reporter

Comment 14

•

3 years ago

•

Edited

If I understand correctly, it's not just disabled tests you care about. In wpt, there are basically three categories of differences you might care about:

Tests that are disabled in fission but not on other configurations. These tests either don't run at all (when whole test files are disabled) or are run but the results are ignored (when specific subtests are disabled; this is rare). The wpt sync never disables tests; it's only done by humans.
Tests that have a fixed expectation that's different between fission and non-fission configurations e.g. expected: FAIL for fission, but expected: PASS for non-fission. These are things which clearly need to be fixed or at least understood.
Tests that have an intermittent result of some kind, especially one that differs between fission and non-fission. This is problematic because we don't have a great system for telling which of the intermittent results actually occur in practice. For example expected: PASS in non-fission and expected: [PASS, FAIL] in fission might have been a one-time failure that the sync added that's now a perma-pass or it might be a perma-fail. This can also affect cases where there isn't a fission-specific expectation e.g. expected: [PASS, FAIL] in both fission and non-fission could hide a test that permafails in fission and perma-passes in non-fission.

Now answering the questions:

can we re-enable all currently disabled WPTs for Fission, and then let the wpt-sync script disable the still failing ones, so that we don't have to individually check each intermittent to see if it can be re-enabled?

It makes sense to go through the expectation ini files and remove any non-perma expectations that differ between fission and non-fission, and re-enable any tests that run on fission, and then run all of that through try with some rebuilds, and then use the mach wpt-update command to update the expectations to the observed results. We can't get the wpt-sync to do this directly but we can do basically the same thing it would do. To make updating the expectations as straightforward as possible, we should try to run all the configurations found on mozilla-central. For comparison [1] is a recent wpt-sync try push; note that it uses --disable-target-task-filter to enable some additional tasks.

Can the wpt-sync script or an adjacent script file a new bug under this meta bug for each Fission WPT that gets disabled so we have a consolidated list? Filing a new bug for each Fission disabled WPT should continue for each wpt-sync run so we don't miss tests that get disabled for Fission.

Getting the sync to do this would be quite non-trivial. The current way the sync files bugs is by looking at the per-PR results, bot on GitHub and in Gecko CI. We don't have fission runs in either of those places yet, and we don't have a mechanism to do something special with failures that only happen in a specific configuration. We also don't have any integration between bug filing and the try pushes we do immediately before landing, so we would miss anything that failed in that case which had passed in the per-PR run; this is fairly common in general.

Instead of tying this to the sync directly I suggest writing a job that will run on central pushes and create an artifact of ini file differences representing regressions between fission and non-fission. This is pretty straightforward and will ensure that we capture all the differences that are annotated in the ini file. It might still miss cases where the expectation is intermittent across configurations but the results are actually different between fission and non-fission. We could capture these by looking at the actual recorded results, but without historical data there are likely to be false-positives from tests that are actually just intermittent.

If you have a complete list of fission regressions, the remaining question is how to track those. The wpt-sync uses an external metadata repo to check if there are already bugs filed for a specific test failure. I don't think we want to reuse that here. Also the fact that the sync files bugs per-pr means that we can keep the volume of bugs filed down to a reasonable level. The problem with auto-filing bugs in general is that it's very hard to script a solution to "are these issues the same bug or a different bug". Humans don't do this perfectly but they are at least better at making informed guesses ;)

Given all of this, I'd prefer if actual bugs were filed by people. Of course we can still figure out some way to ensure that you know when there are fission regressions which are not associated with any bug (e.g. new ones). The main question in doing this is where we want the association between test result and bug to live. This can go in the wpt metadata (I think there's some precendent for that in the fission project, and certainly there is in general). That has the advantage that it's easy for the script that summarizes the regressions to tag each one with a bug number. The big disadvantage is that it means you need to make an actual m-c commit to update the annotations. The other option is that the association lives outside the source tree and we have some way (i.e. script) to update this with the latest data from mozilla-central. That could be in bugzilla if we find some way to pack the data into the bugs, but it's not designed for it. It could probably be something like google sheets assuming there's some API we could use to update a sheet.

Does that make sense?

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=026810e69f92c9d345500a8d45e77b13b2c3edfc

Flags: needinfo?(james)

u608768

Updated

•

3 years ago

Depends on: 1694974

Chris Peterson [:cpeterson]

Comment 15

•

3 years ago

•

Edited

I chatted with jgraham and Kashav on Matrix.

wpt-sync is run a few times a week.

jgraham will try writing a mach command to dump a report of wpt annotations for new Fission failures. I am confirming the exact requirements with jgraham in email.

When a new Fission intermittent is reported, someone on the Fission team will manually file a new bug and add the bug # as a comment on the new failure's annotation line in the wpt .ini metadata file. Keeping the comments with the annotations will allow humans to identify known vs new Fission annotations at a glance.

In bug 1694974, Kashav will remove the existing wpt annotations for Fission intermittents and re-run those tests on Try with wpt-sync to generate the new wpt annotations. Kashav (or cpeterson) will then file new bugs for Fission failures (intermittent or perma-fail) that are still reproducible and add bug # comments to their wpt annotations.

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1695806

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1696042

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1694669

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1692852

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1650694

Chris Peterson [:cpeterson]

Updated

•

3 years ago

No longer depends on: 1664886

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1712639

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1712641

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1712644

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1672848

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1709747

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1710280

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1670899

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1693442

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1699307

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1712648

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1712649

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1712652

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1712654

Chris Peterson [:cpeterson]

Updated

•

3 years ago

Depends on: 1712672

BMO Automation

Updated

•

2 years ago

Severity: normal → S3