Closed Bug 1873336 Opened 6 months ago Closed 5 months ago

Intermittent TCw mozilla/tests/webgpu/cts/webgpu/* | Failure while resetting counters OR Reached unreachable code

Categories

(Testing :: General, defect, P5)

defect

Tracking

(firefox-esr115 unaffected, firefox121 unaffected, firefox122 unaffected, firefox123 wontfix, firefox124 wontfix)

RESOLVED INCOMPLETE
Tracking Status
firefox-esr115 --- unaffected
firefox121 --- unaffected
firefox122 --- unaffected
firefox123 --- wontfix
firefox124 --- wontfix

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Filed by: ncsoregi [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=442396690&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/bgTk9adpQbuPAM-lwBf13g/runs/0/artifacts/public/logs/live_backing.log


[task 2024-01-06T10:33:31.860Z] 10:33:31     INFO - 
[task 2024-01-06T10:33:31.860Z] 10:33:31     INFO - TEST-UNEXPECTED-FAIL | /_mozilla/webgpu/cts/webgpu/shader/execution/expression/call/builtin/tanh/cts.https.html?q=webgpu:shader,execution,expression,call,builtin,tanh:f16:* | :inputSource="const";vectorize="_undef_" - assert_unreached: 
[task 2024-01-06T10:33:31.860Z] 10:33:31     INFO -   - EXCEPTION: WebGPU device failed to initialize with Error "requestAdapter returned null"; not retrying
[task 2024-01-06T10:33:31.860Z] 10:33:31     INFO -     assert@https://web-platform.test:8443/_mozilla/webgpu/common/util/util.js:38:11
[task 2024-01-06T10:33:31.860Z] 10:33:31     INFO -     acquire@https://web-platform.test:8443/_mozilla/webgpu/webgpu/util/device_pool.js:42:11
[task 2024-01-06T10:33:31.860Z] 10:33:31     INFO -     
[task 2024-01-06T10:33:31.860Z] 10:33:31     INFO -  Reached unreachable code
[task 2024-01-06T10:33:31.860Z] 10:33:31     INFO - wpt_fn@https://web-platform.test:8443/_mozilla/webgpu/common/runtime/wpt.js:75:25
[task 2024-01-06T10:33:31.861Z] 10:33:31     INFO - 
Flags: needinfo?(egubler)
Keywords: regression
Regressed by: 1822630

Set release status flags based on info from the regressing bug 1822630

:nataliaCs: There seem to be at least two separate sets of symptoms from the 6 instances of failures currently represented by the Orange Factor index for the past week (which you mentioned in comment 1):

  1. WebGPU CTS is being run in a Linux 18.04 environment, which, as we have determined since bug 1837027, does not have the proper environment for running WebGPU tests.

https://treeherder.mozilla.org/logviewer?job_id=442396690&repo=mozilla-central&lineNumber=2329
https://treeherder.mozilla.org/logviewer?job_id=442396782&repo=mozilla-central&lineNumber=2335

It's salient to note that all of these failures are in test-coverage-wpt jobs, which the WebGPU team does not have direct stewardship over. Furthermore, I don't see how either set of symptoms could be relevant to changes made in bug 1822630. (1) changed the layout of WPT test files (which should be mostly internal to WebGPU WPT tests) and (2) changed the way in which those tests were distributed among chunk in CI (which should also not be visible in other test configurations). Also, these offending tests are running at tier 2, while WebGPU CTS is currently marked as running on tier 3. The discrepancy makes this problem visible, and raise the question: Should we be running tier 3 tests in a tier 2 job at all?

NI'ing :jmaher, who is likely to understand more about both the test-coverage-wpt test and its configuration. CC'ing :jgraham, in case he also has knowledge that gives us leverage to determine the right solution.

Flags: needinfo?(egubler) → needinfo?(jmaher)

Thank you very much for taking the time to check this!

there are some tasks like test-verify and test-coverage that make some assumptions and do the best they can to run properly. They are not run per push and provide ancillary data to CI. In this case test-coverage-wpt is setup to run on linux, but it was never setup to run on linux2204 or handle specifics like webgpu or canvas or privatebrowsing.

we have a reference to test-coverage-wpt in the test-sets.yml:
https://searchfox.org/mozilla-central/source/taskcluster/ci/test/test-sets.yml#133

I would be surprised if the test-coverage tools will work on 22.04; that is a TODO item to see if that works. Eventually we will need to migrate all test-coverage tasks to be run on 22.04 instead of 18.04.

In addition to that, we need to see if we can create a different job for webgpu, maybe test-coverage-wpt-webgpu or something like that. We should look at test-verify as well.

So there are 2 TODO items:

  1. see if test-coverage / code-coverage tools work on existing linux2204-wayland image
  2. figure out how to split out webgpu and other tagged/subharnesses from wpt
Flags: needinfo?(jmaher)

are the tests still at _mozilla/webgpu?

using this guide for a way to find "subsuites":
https://searchfox.org/mozilla-central/source/taskcluster/gecko_taskgraph/util/chunking.py#32

I ask because one of the failing jobs has this in the path: tests/web-platform/mozilla/tests/webgpu/cts/webgpu/api/validation/render_pipeline/overrides/cts.https.html

:jmaher: That's correct, yes! The path with mozilla instead of _mozilla looks like an error of some kind. I'll investigate. Oh, derp, right, _mozilla in the WPT URL means mozilla in-tree. Nothing looks incorrect to me ATM, actually. 😅

:jmaher: For context, bug 1822630 changed the layout of _mozilla/webgpu somewhat, but shouldn't have moved tests outside of it.

Set release status flags based on info from the regressing bug 1822630

Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.