Open Bug 1683253 Opened 9 months ago Updated 4 months ago

Intermittent TEST-UNEXPECTED-TIMEOUT | toolkit/components/extensions/test/xpcshell/test_ext_webRequest_cancelWithReason.js | Test timed out

Categories

(WebExtensions :: General, defect, P5)

Unspecified
Android
defect

Tracking

(Not tracked)

REOPENED

People

(Reporter: intermittent-bug-filer, Unassigned)

Details

(Keywords: intermittent-failure, leave-open, Whiteboard: [stockwell disabled])

Attachments

(1 file)

Filed by: btara [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=324901855&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Z2eBbzcZRYeZx6jvXgZr7w/runs/0/artifacts/public/logs/live_backing.log


[task 2020-12-18T05:22:16.583Z] 05:22:16     INFO -  TEST-START | xpcshell.ini:toolkit/components/extensions/test/xpcshell/test_ext_webRequest_cancelWithReason.js
[task 2020-12-18T05:27:16.586Z] 05:27:16  WARNING -  TEST-UNEXPECTED-TIMEOUT | xpcshell.ini:toolkit/components/extensions/test/xpcshell/test_ext_webRequest_cancelWithReason.js | Test timed out
[task 2020-12-18T05:27:16.586Z] 05:27:16     INFO -  TEST-INFO took 300002ms
[task 2020-12-18T05:27:16.671Z] 05:27:16     INFO -  xpcshell return code: 143
[task 2020-12-18T05:27:16.704Z] 05:27:16  WARNING -  TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT)
[task 2020-12-18T05:27:16.705Z] 05:27:16     INFO -  INFO | Result summary:
[task 2020-12-18T05:27:16.705Z] 05:27:16     INFO -  INFO | Passed: 394
[task 2020-12-18T05:27:16.705Z] 05:27:16  WARNING -  INFO | Failed: 1
[task 2020-12-18T05:27:16.705Z] 05:27:16  WARNING -  One or more unittests failed.
[task 2020-12-18T05:27:16.705Z] 05:27:16     INFO -  INFO | Todo: 4
[task 2020-12-18T05:27:16.705Z] 05:27:16     INFO -  INFO | Retried: 0
[task 2020-12-18T05:27:16.705Z] 05:27:16     INFO -  SUITE-END | took 942s
[task 2020-12-18T05:27:16.706Z] 05:27:16     INFO -  Node moz-http2 server shutting down ...
[task 2020-12-18T05:27:16.789Z] 05:27:16    ERROR - Return code: 1
[task 2020-12-18T05:27:16.789Z] 05:27:16     INFO - TinderboxPrint: xpcshell<br/>394/<em class="testfail">1</em>/4
[task 2020-12-18T05:27:16.790Z] 05:27:16     INFO - ##### xpcshell log ends
[task 2020-12-18T05:27:16.790Z] 05:27:16  WARNING - # TBPL WARNING #
[task 2020-12-18T05:27:16.790Z] 05:27:16  WARNING - setting return code to 1```

Fails frequently on Android, please investigate (first failures on 2020-12-18 UTC morning then almost gone, now back to frequent).

Flags: needinfo?(mixedpuppy)

We had 104 occurrences of this failure in the past week: https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2021-01-16&endday=2021-01-23&tree=all&bug=1683253, all of them on android-em-7-0-x86_64 opt.
Most recent one: https://treeherder.mozilla.org/logviewer?job_id=327556072&repo=autoland&lineNumber=2338

[task 2021-01-23T03:45:21.340Z] 03:45:21 INFO - TEST-START | xpcshell.ini:toolkit/components/extensions/test/xpcshell/test_ext_webRequest_cancelWithReason.js
[task 2021-01-23T03:50:21.341Z] 03:50:21 WARNING - TEST-UNEXPECTED-TIMEOUT | xpcshell.ini:toolkit/components/extensions/test/xpcshell/test_ext_webRequest_cancelWithReason.js | Test timed out
[task 2021-01-23T03:50:21.342Z] 03:50:21 INFO - TEST-INFO took 300001ms
[task 2021-01-23T03:50:21.413Z] 03:50:21 INFO - xpcshell return code: 143
[task 2021-01-23T03:50:21.442Z] 03:50:21 WARNING - TEST-UNEXPECTED-FAIL | Received SIGINT (control-C), so stopped run. (Use --keep-going to keep running tests after killing one with SIGINT)
[task 2021-01-23T03:50:21.442Z] 03:50:21 INFO - INFO | Result summary:
[task 2021-01-23T03:50:21.442Z] 03:50:21 INFO - INFO | Passed: 265
[task 2021-01-23T03:50:21.442Z] 03:50:21 WARNING - INFO | Failed: 1
[task 2021-01-23T03:50:21.442Z] 03:50:21 WARNING - One or more unittests failed.
[task 2021-01-23T03:50:21.442Z] 03:50:21 INFO - INFO | Todo: 0
[task 2021-01-23T03:50:21.442Z] 03:50:21 INFO - INFO | Retried: 0
[task 2021-01-23T03:50:21.442Z] 03:50:21 INFO - SUITE-END | took 844s
[task 2021-01-23T03:50:21.443Z] 03:50:21 INFO - Node moz-http2 server shutting down ...
[task 2021-01-23T03:50:21.567Z] 03:50:21 ERROR - Return code: 1
[task 2021-01-23T03:50:21.567Z] 03:50:21 INFO - TinderboxPrint: xpcshell<br/>265/<em class="testfail">1</em>/0
[task 2021-01-23T03:50:21.567Z] 03:50:21 INFO - ##### xpcshell log ends
[task 2021-01-23T03:50:21.567Z] 03:50:21 WARNING - # TBPL WARNING #
[task 2021-01-23T03:50:21.567Z] 03:50:21 WARNING - setting return code to 1
[task 2021-01-23T03:50:21.567Z] 03:50:21 WARNING - The xpcshell suite: xpcshell ran with return status: WARNING
[task 2021-01-23T03:50:21.567Z] 03:50:21 INFO - Running post-action listener: _package_coverage_data
[task 2021-01-23T03:50:21.568Z] 03:50:21 INFO - Running post-action listener: _resource_record_post_action
[task 2021-01-23T03:50:21.568Z] 03:50:21 INFO - Running post-action listener: process_java_coverage_data
[task 2021-01-23T03:50:21.568Z] 03:50:21 INFO - Running post-action listener: stop_device
[task 2021-01-23T03:50:21.948Z] 03:50:21 INFO - Killing logcat pid 1756.
[task 2021-01-23T03:50:21.948Z] 03:50:21 INFO - Killing every process called qemu-system-x86_64
[task 2021-01-23T03:50:21.958Z] 03:50:21 INFO - [mozharness: 2021-01-23 03:50:21.957658Z] Finished run-tests step (success)
[task 2021-01-23T03:50:21.958Z] 03:50:21 INFO - Running post-run listener: _resource_record_post_run
[task 2021-01-23T03:50:22.381Z] 03:50:22 INFO - Total resource usage - Wall time: 881s; CPU: 53.0%; Read bytes: 226567680; Write bytes: 2231271424; Read time: 55747; Write time: 209250
[task 2021-01-23T03:50:22.382Z] 03:50:22 INFO - TinderboxPrint: CPU usage<br/>53.5%
[task 2021-01-23T03:50:22.382Z] 03:50:22 INFO - TinderboxPrint: I/O read bytes / time<br/>226,567,680 / 55,747
[task 2021-01-23T03:50:22.382Z] 03:50:22 INFO - TinderboxPrint: I/O write bytes / time<br/>2,231,271,424 / 209,250
[task 2021-01-23T03:50:22.382Z] 03:50:22 INFO - TinderboxPrint: CPU guest<br/>16,494.9 (16.5%)
[task 2021-01-23T03:50:22.382Z] 03:50:22 INFO - TinderboxPrint: CPU idle<br/>38,636.8 (38.8%)
[task 2021-01-23T03:50:22.382Z] 03:50:22 INFO - TinderboxPrint: CPU system<br/>7,149.8 (7.2%)
[task 2021-01-23T03:50:22.382Z] 03:50:22 INFO - TinderboxPrint: CPU user<br/>36,487.5 (36.6%)
[task 2021-01-23T03:50:22.382Z] 03:50:22 INFO - TinderboxPrint: Swap in / out<br/>0 / 0
[task 2021-01-23T03:50:22.383Z] 03:50:22 INFO - start-emulator - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2021-01-23T03:50:22.388Z] 03:50:22 INFO - verify-device - Wall time: 27s; CPU: 51.0%; Read bytes: 0; Write bytes: 24784896; Read time: 0; Write time: 2095
[task 2021-01-23T03:50:22.388Z] 03:50:22 INFO - install - Wall time: 0s; CPU: Can't collect data; Read bytes: 0; Write bytes: 0; Read time: 0; Write time: 0
[task 2021-01-23T03:50:22.523Z] 03:50:22 INFO - run-tests - Wall time: 855s; CPU: 54.0%; Read bytes: 226567680; Write bytes: 2206486528; Read time: 55747; Write time: 207155
[task 2021-01-23T03:50:25.140Z] 03:50:25 WARNING - returning nonzero exit status 1
[task 2021-01-23T03:50:25.202Z] cleanup
[task 2021-01-23T03:50:25.202Z] + cleanup
[task 2021-01-23T03:50:25.202Z] + local rv=1
[task 2021-01-23T03:50:25.202Z] + [[ -s /builds/worker/.xsession-errors ]]
[task 2021-01-23T03:50:25.202Z] + cp /builds/worker/.xsession-errors /builds/worker/artifacts/public/xsession-errors.log
[task 2021-01-23T03:50:25.204Z] + '[' ']'
[task 2021-01-23T03:50:25.204Z] + true
[task 2021-01-23T03:50:25.204Z] + cleanup_xvfb
[task 2021-01-23T03:50:25.205Z] ++ pidof Xvfb
[task 2021-01-23T03:50:25.210Z] + local xvfb_pid=50
[task 2021-01-23T03:50:25.210Z] + local vnc=false
[task 2021-01-23T03:50:25.210Z] + local interactive=false
[task 2021-01-23T03:50:25.210Z] + '[' -n 50 ']'
[task 2021-01-23T03:50:25.210Z] + [[ false == false ]]
[task 2021-01-23T03:50:25.210Z] + [[ false == false ]]
[task 2021-01-23T03:50:25.210Z] + kill 50
[task 2021-01-23T03:50:25.210Z] + screen -XS xvfb quit
[task 2021-01-23T03:50:25.225Z] + exit 1
[taskcluster 2021-01-23 03:50:26.774Z] === Task Finished ===
[taskcluster 2021-01-23 03:50:28.873Z] Unsuccessful task run with exit code: 1 completed in 1056.667 seconds

Looking at 'similar jobs' view, I see that successes are ~19min, while failures have a 20min limit. At the very least we need to bump the timeout. I'm still trying to understand the specific test case and why it is often the victim of the whole task.

EDIT: This is incorrect. The specific test is timing out. Normally is ~1sec, but in failures it is 5min which fails the job.

Flags: needinfo?(tcampbell)

Bug 1667804 definitely caused this to get quite intermittent. I do see occasional failures beforehand though. We are still doing a few experiments on try to narrow down since toggling prefs isn't having the impact that I was expecting on xpc-shell.

The primary impact of Bug 1667804 will be GC operations appearing at different times and general timing changes. This makes me suspect there is some sort of pre-exisiting race condition in test.

Assignee: nobody → tcampbell
No longer blocks: stencil-backlog
Regressed by: stencil-mvp

With further investigation, I find that without stencil-mvp, but simple disabling the off-thread parsing (which is a performance heuristic and slightly dynamic already), I can trigger failures. To me, this indicates that the test or the code it is testing is brittle and slight changes in gc-timing are breaking it. Disabling off-thread JS parsing removes some extra garbage collector traffic.

The issue is likely not recent, but changes if browser timings have made it appear more often.

Assignee: tcampbell → nobody
No longer regressed by: stencil-mvp

There have been 77 total failures in the last 7 days:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2021-02-06&endday=2021-02-13&tree=trunk&bug=1683253
Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=329861391&repo=autoland&lineNumber=3004

Affected platforms are:

  • linux1804-64-shippable
  • android-em-7-0-x86_64-shippable
  • android-em-7-0-x86_64 opt & debug
Assignee: nobody → ncsoregi
Status: NEW → ASSIGNED

Update:

There has been a total of 107 failures within the last 7 days on Android 7.0 x86-64 Shippable opt and Android 7.0 x86-64 opt/debug.

Created a disabling patch until further updates.

Try push: https://treeherder.mozilla.org/jobs?repo=try&group_state=expanded&revision=e54bc611dd4bde92f7d34bbfd75de070aea522b7

Pushed by ncsoregi@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/35ad405ddd13
Disabled test_ext_webRequest_cancelWithReason.js on Android 7.0 x86-64. r=intermittent-reviewers,mixedpuppy,jmaher
Status: ASSIGNED → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → 87 Branch
Assignee: ncsoregi → nobody
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
Whiteboard: [stockwell disable-recommended] → [stockwell disabled]
Target Milestone: 87 Branch → ---
Flags: needinfo?(mixedpuppy)
OS: Unspecified → Android
You need to log in before you can comment on or make changes to this bug.