Closed Bug 1754888 Opened 3 years ago Closed 3 years ago

Intermittent TVw TinderboxPrint: Per-test run of .../RTCRtpReceiver-getStats.https.html<br/>: FAILURE

Categories

(Core :: WebRTC: Signaling, defect, P5)

defect

Tracking

()

RESOLVED FIXED
99 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox97 --- unaffected
firefox98 --- wontfix
firefox99 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jgraham)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Attachments

(1 file)

Filed by: csabou [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=367484346&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/CxPiO4XNQM2wckAmDeeVfQ/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/CxPiO4XNQM2wckAmDeeVfQ/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1


[task 2022-02-10T22:51:51.655Z] 22:51:51    ERROR - Return code: 1
[task 2022-02-10T22:51:51.656Z] 22:51:51    ERROR - # TBPL FAILURE #
[task 2022-02-10T22:51:51.656Z] 22:51:51  WARNING - setting return code to 2
[task 2022-02-10T22:51:51.656Z] 22:51:51    ERROR - TinderboxPrint: Per-test run of .../RTCRtpReceiver-getStats.https.html<br/>: FAILURE
[task 2022-02-10T22:51:51.656Z] 22:51:51     INFO - Running post-action listener: _package_coverage_data
[task 2022-02-10T22:51:51.657Z] 22:51:51     INFO - Running post-action listener: _resource_record_post_action
[task 2022-02-10T22:51:51.657Z] 22:51:51     INFO - Running post-action listener: process_java_coverage_data
[task 2022-02-10T22:51:51.657Z] 22:51:51     INFO - Running post-action listener: stop_device
[task 2022-02-10T22:51:51.657Z] 22:51:51     INFO - [mozharness: 2022-02-10 22:51:51.657244Z] Finished run-tests step (success)
[task 2022-02-10T22:51:51.658Z] 22:51:51     INFO - Running post-run listener: _resource_record_post_run
[task 2022-02-10T22:51:51.844Z] 22:51:51     INFO - Total resource usage - Wall time: 64s; CPU: 42%; Read bytes: 0; Write bytes: 950349824; Read time: 0; Write time: 182996
[task 2022-02-10T22:51:51.844Z] 22:51:51     INFO - TinderboxPrint: CPU usage<br/>42.3%
[task 2022-02-10T22:51:51.844Z] 22:51:51     INFO - TinderboxPrint: I/O read bytes / time<br/>0 / 0
[task 2022-02-10T22:51:51.844Z] 22:51:51     INFO - TinderboxPrint: I/O write bytes / time<br/>950,349,824 / 182,996
[task 2022-02-10T22:51:51.844Z] 22:51:51     INFO - TinderboxPrint: CPU idle<br/>144.7 (56.7%)
[task 2022-02-10T22:51:51.845Z] 22:51:51     INFO - TinderboxPrint: CPU iowait<br/>2.9 (1.1%)
[task 2022-02-10T22:51:51.845Z] 22:51:51     INFO - TinderboxPrint: CPU system<br/>12.4 (4.9%)
[task 2022-02-10T22:51:51.845Z] 22:51:51     INFO - TinderboxPrint: CPU user<br/>95.0 (37.2%)
[task 2022-02-10T22:51:51.845Z] 22:51:51     INFO - TinderboxPrint: Swap in / out<br/>0 / 0
[task 2022-02-10T22:51:51.845Z] 22:51:51     INFO - pull - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2022-02-10T22:51:51.846Z] 22:51:51     INFO - start-emulator - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2022-02-10T22:51:51.846Z] 22:51:51     INFO - verify-device - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2022-02-10T22:51:51.846Z] 22:51:51     INFO - install - Wall time: 17s; CPU: 25%; Read bytes: 0; Write bytes: 33185792; Read time: 0; Write time: 25284
[task 2022-02-10T22:51:51.847Z] 22:51:51     INFO - run-tests - Wall time: 47s; CPU: 49%; Read bytes: 0; Write bytes: 917164032; Read time: 0; Write time: 157712
[task 2022-02-10T22:51:51.863Z] 22:51:51  WARNING - returning nonzero exit status 2
[task 2022-02-10T22:51:52.178Z] cleanup
[task 2022-02-10T22:51:52.178Z] + cleanup
[task 2022-02-10T22:51:52.178Z] + local rv=2
[task 2022-02-10T22:51:52.178Z] + [[ -s /builds/worker/.xsession-errors ]]
[task 2022-02-10T22:51:52.179Z] + cp /builds/worker/.xsession-errors /builds/worker/artifacts/public/xsession-errors.log
[task 2022-02-10T22:51:52.180Z] + '[' ']'
[task 2022-02-10T22:51:52.180Z] + true
Has Regression Range: --- → yes

I'm pretty sure there is a problem with spurious timeouts in TV itself. We're seeing this all over the tree, and the timeouts do not look legitimate because the test executes just fine on all 10 iterations.

See Also: → 1753827, 1753829, 1753828, 1754000

Set release status flags based on info from the regressing bug 1753938

TVw (test-verify for web-platform tests) appears to be perma-fail. TV and TVg appear to be okay (there are recent passing tasks).

This range looks interesting:

https://treeherder.mozilla.org/jobs?repo=autoland&searchStr=TVw&tochange=a2f550668dc7a8adbcab4aa5b66f67ae6a6f7768&fromchange=d123c8a6e94c5a916f64582f7568421267db71ea

Actually, the passing tasks there were no-ops, running no tests, I think. So a more accurate, but larger range is:

https://treeherder.mozilla.org/jobs?repo=autoland&searchStr=TVw&tochange=a2f550668dc7a8adbcab4aa5b66f67ae6a6f7768&fromchange=4b4089d4c9d8556b3e3b445b7697f2833cc5ba7a

:jgraham - Any idea what's gone wrong here?

Flags: needinfo?(james)

wpt PR 32373 added the ability to stop wpt stability jobs without
completing all the browser-restarting iterations if we run out of
time. This was designed to avoid task timeouts in upstream CI. However
it missed the case (not used upstream) where we perform retries
without restarting the browser. In this case we were always setting
the number of iterations to 1, so it looked like we'd never completed
the full set. Instead we should always set the number of iterations to
the number of test reruns we had since there's no logic to halt early
in this case.

This should stop the verify jobs always returning a timeout failure in
gecko CI.

Assignee: nobody → james
Status: NEW → ASSIGNED

I've made a patch. The tl;dr is that some upstream changes didn't account for our verify setup where we first run the test multiple times without restarts and then run the test multiple times with restarts; it thought that running the test 10x without restarts was running a single iteration, not the expected 10 iterations, so it assumed we'd timed out trying to complete multiple iterations and hadn't verified anything. The fix is just to set the iteration count to the number of reruns in this case.

Flags: needinfo?(james)
Pushed by james@hoppipolla.co.uk: https://hg.mozilla.org/integration/autoland/rev/e076ff6e02d1 When running wpt stability without restart, count all iterations, r=gbrown
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/32966 for changes under testing/web-platform/tests
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 99 Branch
Upstream PR merged by moz-wptsync-bot
See Also: → 1806746
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: