Open Bug 1644147 Opened 4 years ago Updated 4 months ago

Intermittent TEST-UNEXPECTED-TIMEOUT various wpt core tests | TestRunner hit external timeout (this may indicate a hang) DO NOT USE FOR CLASSIFICATION

Categories

(Testing :: web-platform-tests, defect)

defect

Tracking

(Not tracked)

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: intermittent-failure, leave-open)

Attachments

(2 files, 1 obsolete file)

Filed by: btara [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=305445220&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/PFtrYeVKR569BpQwJSwK6Q/runs/0/artifacts/public/logs/live_backing.log


[task 2020-06-08T11:35:17.819Z] 11:35:17     INFO - TEST-START | /content-security-policy/embedded-enforcement/blocked-iframe-are-cross-origin.html
[task 2020-06-08T11:35:17.859Z] 11:35:17     INFO - PID 4744 | [CodeCoverage] Requested flush for 3520.
[task 2020-06-08T11:35:19.643Z] 11:35:19     INFO - PID 4744 | [CodeCoverage] flush completed.
[task 2020-06-08T11:35:20.009Z] 11:35:20     INFO - PID 4744 | [CodeCoverage] JS flush completed.
[task 2020-06-08T11:35:20.009Z] 11:35:20     INFO - PID 4744 | [CodeCoverage] Requested flush for 7856.
[task 2020-06-08T11:35:20.009Z] 11:35:20     INFO - PID 4744 | [CodeCoverage] Requested flush for 8524.
[task 2020-06-08T11:35:20.010Z] 11:35:20     INFO - PID 4744 | [CodeCoverage] Requested flush for 2360.
[task 2020-06-08T11:35:20.010Z] 11:35:20     INFO - PID 4744 | [CodeCoverage] Requested flush for 9740.
[task 2020-06-08T11:35:20.010Z] 11:35:20     INFO - PID 4744 | [CodeCoverage] Requested flush for 7316.
[task 2020-06-08T11:35:24.378Z] 11:35:24     INFO - PID 4744 | [CodeCoverage] flush completed.
[task 2020-06-08T11:35:24.397Z] 11:35:24     INFO - PID 4744 | [CodeCoverage] JS flush completed.
[task 2020-06-08T11:35:28.764Z] 11:35:28     INFO - PID 4744 | [CodeCoverage] flush completed.
[task 2020-06-08T11:35:28.789Z] 11:35:28     INFO - PID 4744 | [CodeCoverage] JS flush completed.
[task 2020-06-08T11:35:33.367Z] 11:35:33     INFO - PID 4744 | [CodeCoverage] flush completed.
[task 2020-06-08T11:35:33.406Z] 11:35:33     INFO - PID 4744 | [CodeCoverage] JS flush completed.
[task 2020-06-08T11:35:37.795Z] 11:35:37     INFO - PID 4744 | [CodeCoverage] flush completed.
[task 2020-06-08T11:35:37.856Z] 11:35:37     INFO - PID 4744 | [CodeCoverage] JS flush completed.
[task 2020-06-08T11:35:42.284Z] 11:35:42     INFO - PID 4744 | [CodeCoverage] flush completed.
[task 2020-06-08T11:35:42.286Z] 11:35:42     INFO - PID 4744 | [CodeCoverage] JS flush completed.
[task 2020-06-08T11:35:43.603Z] 11:35:43     INFO - PID 4744 | [CodeCoverage] Requested flush for 3520.
[task 2020-06-08T11:35:45.572Z] 11:35:45     INFO - PID 4744 | [CodeCoverage] flush completed.
[task 2020-06-08T11:35:45.839Z] 11:35:45     INFO - PID 4744 | [CodeCoverage] JS flush completed.
[task 2020-06-08T11:35:45.840Z] 11:35:45     INFO - PID 4744 | [CodeCoverage] Requested flush for 8524.
[task 2020-06-08T11:35:45.840Z] 11:35:45     INFO - PID 4744 | [CodeCoverage] Requested flush for 7856.
[task 2020-06-08T11:35:45.841Z] 11:35:45     INFO - PID 4744 | [CodeCoverage] Requested flush for 2360.
[task 2020-06-08T11:35:45.841Z] 11:35:45     INFO - PID 4744 | [CodeCoverage] Requested flush for 7316.
[task 2020-06-08T11:35:45.841Z] 11:35:45     INFO - PID 4744 | [CodeCoverage] Requested flush for 9740.
[task 2020-06-08T11:35:50.540Z] 11:35:50     INFO - PID 4744 | [CodeCoverage] flush completed.
[task 2020-06-08T11:35:50.565Z] 11:35:50     INFO - PID 4744 | [CodeCoverage] JS flush completed.
[task 2020-06-08T11:35:52.818Z] 11:35:52     INFO - Got timeout in harness
[task 2020-06-08T11:35:52.837Z] 11:35:52     INFO - TEST-UNEXPECTED-TIMEOUT | /content-security-policy/embedded-enforcement/blocked-iframe-are-cross-origin.html | TestRunner hit external timeout (this may indicate a hang)
[task 2020-06-08T11:35:52.837Z] 11:35:52     INFO - TEST-INFO took 35020ms
Summary: Intermittent TEST-UNEXPECTED-TIMEOUT TCw | various core tests | TestRunner hit external timeout (this may indicate a hang) → Intermittent TEST-UNEXPECTED-TIMEOUT various wpt core tests | TestRunner hit external timeout (this may indicate a hang)

There are 44 total failures in the last 7 days on

[task 2021-01-10T14:02:03.467Z] 14:02:03 INFO - TEST-START | /html/cross-origin-opener-policy/popup-same-origin-allow-popups-with-cross-origin.https.html
[task 2021-01-10T14:02:03.470Z] 14:02:03 INFO - Closing window 392
[task 2021-01-10T14:02:03.555Z] 14:02:03 INFO - Closing window 396
[task 2021-01-10T14:02:03.564Z] 14:02:03 INFO - PID 1823 | [Child 1925, Main Thread] WARNING: nsAppShell::Exit() called redundantly: file /builds/worker/checkouts/gecko/widget/cocoa/nsAppShell.mm:696
[task 2021-01-10T14:02:03.566Z] 14:02:03 INFO - PID 1823 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2021-01-10T14:02:03.567Z] 14:02:03 INFO - PID 1823 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2021-01-10T14:02:03.642Z] 14:02:03 INFO - PID 1823 | [Child 1925, Main Thread] WARNING: NS_ENSURE_TRUE(!mHasOrHasHadOwnerWindow || mOwnerWindow) failed: file /builds/worker/checkouts/gecko/dom/events/DOMEventTargetHelper.cpp:327
[task 2021-01-10T14:02:03.643Z] 14:02:03 INFO - PID 1823 | [Child 1925, Main Thread] WARNING: NS_ENSURE_TRUE(!mHasOrHasHadOwnerWindow || mOwnerWindow) failed: file /builds/worker/checkouts/gecko/dom/events/DOMEventTargetHelper.cpp:327
[task 2021-01-10T14:02:03.655Z] 14:02:03 INFO - PID 1823 | [Child 1930, Main Thread] WARNING: NS_ENSURE_TRUE(mDocShell) failed: file /builds/worker/checkouts/gecko/toolkit/components/browser/nsWebBrowser.cpp:546
[task 2021-01-10T14:02:03.688Z] 14:02:03 INFO - PID 1823 | [Child 1925, Main Thread] WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS', file /builds/worker/checkouts/gecko/xpcom/base/nsCycleCollector.cpp:3364
[task 2021-01-10T14:02:03.693Z] 14:02:03 INFO - PID 1823 | [Child 1930, Main Thread] WARNING: nsAppShell::Exit() called redundantly: file /builds/worker/checkouts/gecko/widget/cocoa/nsAppShell.mm:696
[task 2021-01-10T14:02:03.694Z] 14:02:03 INFO - PID 1823 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2021-01-10T14:02:03.696Z] 14:02:03 INFO - PID 1823 | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2021-01-10T14:02:03.710Z] 14:02:03 INFO - PID 1823 | [Child 1925, Main Thread] WARNING: NS_ENSURE_TRUE(Preferences::InitStaticMembers()) failed: file /builds/worker/checkouts/gecko/modules/libpref/Preferences.cpp:4378
[task 2021-01-10T14:02:03.804Z] 14:02:03 INFO - PID 1823 | [Child 1930, Main Thread] WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS', file /builds/worker/checkouts/gecko/xpcom/base/nsCycleCollector.cpp:3364
[task 2021-01-10T14:02:03.816Z] 14:02:03 INFO - PID 1823 | [Child 1930, Main Thread] WARNING: NS_ENSURE_TRUE(Preferences::InitStaticMembers()) failed: file /builds/worker/checkouts/gecko/modules/libpref/Preferences.cpp:4378
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - Traceback (most recent call last):
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - File "/Users/cltbld/tasks/task_1610284352/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 743, in run_func
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - self.result = True, self.func(self.protocol, self.url, self.timeout)
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - File "/Users/cltbld/tasks/task_1610284352/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 846, in do_testharness
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - parent_window = protocol.testharness.close_old_windows(protocol)
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - File "/Users/cltbld/tasks/task_1610284352/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 202, in close_old_windows
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - runner_handle = self._close_windows()
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - File "/Users/cltbld/tasks/task_1610284352/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 193, in _close_windows
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - self.dismiss_alert(lambda: self.marionette.close())
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - File "/Users/cltbld/tasks/task_1610284352/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 210, in dismiss_alert
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - f()
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - File "/Users/cltbld/tasks/task_1610284352/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 193, in <lambda>
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - self.dismiss_alert(lambda: self.marionette.close())
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - File "/Users/cltbld/tasks/task_1610284352/build/venv/lib/python3.7/site-packages/marionette_driver/marionette.py", line 1363, in close
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - return self._send_message("WebDriver:CloseWindow")
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - File "/Users/cltbld/tasks/task_1610284352/build/venv/lib/python3.7/site-packages/marionette_driver/decorators.py", line 27, in _
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - return func(*args, **kwargs)
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - File "/Users/cltbld/tasks/task_1610284352/build/venv/lib/python3.7/site-packages/marionette_driver/marionette.py", line 621, in _send_message
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - msg = self.client.request(name, params)
[task 2021-01-10T14:05:13.563Z] 14:05:13 WARNING - File "/Users/cltbld/tasks/task_1610284352/build/venv/lib/python3.7/site-packages/marionette_driver/transport.py", line 292, in request
[task 2021-01-10T14:05:13.564Z] 14:05:13 WARNING - return self.receive()
[task 2021-01-10T14:05:13.564Z] 14:05:13 WARNING - File "/Users/cltbld/tasks/task_1610284352/build/venv/lib/python3.7/site-packages/marionette_driver/transport.py", line 178, in receive
[task 2021-01-10T14:05:13.564Z] 14:05:13 WARNING - if len(remaining) == int(length):
[task 2021-01-10T14:05:13.564Z] 14:05:13 WARNING - ValueError: invalid literal for int() with base 10: b',{"error"'
[task 2021-01-10T14:05:13.564Z] 14:05:13 WARNING -
[task 2021-01-10T14:05:18.664Z] 14:05:18 INFO - Got timeout in harness
[task 2021-01-10T14:05:18.667Z] 14:05:18 INFO - TEST-UNEXPECTED-TIMEOUT | /html/cross-origin-opener-policy/popup-same-origin-allow-popups-with-cross-origin.https.html | TestRunner hit external timeout (this may indicate a hang)
[task 2021-01-10T14:05:18.667Z] 14:05:18 INFO - TEST-INFO took 195201ms
[task 2021-01-10T14:06:34.740Z] 14:06:34 INFO - Browser exited with return code -15
[task 2021-01-10T14:06:34.741Z] 14:06:34 INFO - PROCESS LEAKS /var/folders/b6/bh8kzy693dzc536ry_1rbvj8000016/T/tmp2agbp9gr/runtests_leaks_1476.log
[task 2021-01-10T14:06:34.741Z] 14:06:34 WARNING - Firefox didn't exit cleanly, not processing leak logs
[task 2021-01-10T14:06:34.743Z] 14:06:34 INFO - Browser not responding, setting status to CRASH
[task 2021-01-10T14:06:34.743Z] 14:06:34 WARNING - Command left in command_queue during cleanup: 'test_ended', (<wptrunner.wpttest.TestharnessTest /html/cross-origin-opener-policy/popup-same-origin-allow-popups-with-cross-origin.https.html>, (<wptrunner.wpttest.TestharnessResult CRASH>, []))
[task 2021-01-10T14:06:34.744Z] 14:06:34 INFO - Closing logging queue
[task 2021-01-10T14:06:34.744Z] 14:06:34 INFO - queue closed
Jens please assign someone to take a look. Thank you.

Flags: needinfo?(jstutte)

Hmm, the spike seems to be over? I'll trigger a re-triage, anyway.

Severity: normal → --
Flags: needinfo?(jstutte)

A bulk of the recent failures appeared as wpt tests were enabled on tsan as tier 1 in Bug 1682723.
Gankra, could you please take a look?

Flags: needinfo?(a.beingessner)

It looks reasonable to disable:

  • /fetch/http-cache/304-update.any.serviceworker.html
  • /feature-policy/experimental-features/trust-token-redemption-default-feature-policy.tentative.https.sub.html

but everything else is a one-off failure (which matches my experience with wpt timeouts). Perhaps one of the timeout thresholds for these tests needs to be further increased?

Flags: needinfo?(a.beingessner)

Ni Natalia for the disable.

Flags: needinfo?(ncsoregi)
Assignee: nobody → a.beingessner
Flags: needinfo?(ncsoregi)

I tried doubling the 1000-second timeout to 2000 but the failure rate stayed the same, suggesting these are all infinite hangs: https://treeherder.mozilla.org/jobs?repo=try&revision=6399c86fee60b7fdf1d12fef4c9c8895092d13bd&selectedTaskRun=aEpx1OsjTKaVXqKiGsxGFw.0

Pushed by malexandru@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0af0862b9a2c
Disable trust-token-redemption-default-feature-policy.tentative.https.sub.html and 304-update.any.js on linux tsan. r=Gankra
Pushed by malexandru@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0701a467ca0a
Fix lint error in trust-token-redemption-default-feature-policy.tentative.https.sub.html.ini a=lint-fix
Pushed by malexandru@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8ee941705a44
Fix another lint error in testing/web-platform/meta/fetch/http-cache/304-update.any.js.ini a=lint-fix
Pushed by malexandru@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/97d541fff62b
Use "OK" instead of "PASS" for test expectation. a=fix

Yeah we're not sure what's going wrong / how to fix it yet. The timeouts appear to be infinite hangs on completely random tests and we're already using linux-xlarge machines.

So short of adding "TIMEOUT" as a valid result for every wpt test I'm not sure what more can be done. I'm not sure what tools I could use to debug these hangs further.

Flags: needinfo?(a.beingessner)

James any input on this? Should we file separate bugs for each failure to be able to edit the manifests more easily?

Flags: needinfo?(james)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

So the TEST-UNEXPECTED-TIMEOUT results seem to be after 45s (typically), which I think is from a 4x timeout multiplier. So it's possible those aren't hangs but are just needing larger test timeout for TSAN. The "no output after 2000s" looks to me like a shutdown hang since it seems to always be happening when all the tests have run.

Somewhat unrelated but we really need to add TSAN to the set of properties the sync is able to set in the metadata. That means adding it to the list at https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/browsers/firefox.py#228.

Flags: needinfo?(james)

This affects random tests. https://sql.telemetry.mozilla.org/queries/77721/source?p_bug_id=1644147 yields <20 test+platform combination with 3+ failures but 1k if we look at 1 or more failures.

Attachment #9200592 - Attachment is obsolete: true
Depends on: 1686101
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

I got a rough build going where wpt has some memory stats, and I'm now reasonably confident that it's not memory exhaustion, but rather something strange with how wpt (re)starts the browser causing hangs. Basically all of the timeouts are right after a (re)start of the browser.

https://treeherder.mozilla.org/jobs?repo=try&revision=2f094d451229c6276534e7a5d7cfefb406540336&selectedTaskRun=W-r1mhztRFObE3d61hLqYA.0

(Ignore Wr jobs, my wip patch was broken for those)

Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

There are 130 total failures in the last 7 days on

  • linux1804-64-ccov opt
  • linux1804-64-tsan opt
  • windows10-64-ccov-qr opt

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=330152054&repo=mozilla-central&lineNumber=3050

[task 2021-02-16T22:17:53.826Z] 22:17:53 INFO - TEST-START | /dom/ranges/Range-insertNode.html
[task 2021-02-16T22:17:53.828Z] 22:17:53 INFO - Closing window 64
[task 2021-02-16T22:17:55.949Z] 22:17:55 INFO - PID 6957 | ###!!! [Child][MessageChannel] Error: (msgtype=0x3A0114,name=PContent::Msg_AccumulateChildHistograms) Channel closing: too late to send/recv, messages will be lost
[task 2021-02-16T22:17:55.950Z] 22:17:55 INFO - PID 6957 | ###!!! [Child][MessageChannel] Error: (msgtype=0x3A0119,name=PContent::Msg_RecordDiscardedData) Channel closing: too late to send/recv, messages will be lost
[task 2021-02-16T22:18:03.583Z] 22:18:03 INFO - Got timeout in harness
[task 2021-02-16T22:18:03.586Z] 22:18:03 INFO - TEST-UNEXPECTED-TIMEOUT | /feature-policy/reporting/screen-wake-lock-reporting.https.html | TestRunner hit external timeout (this may indicate a hang)
[task 2021-02-16T22:18:03.586Z] 22:18:03 INFO - TEST-INFO took 45005ms
[task 2021-02-16T22:18:05.949Z] 22:18:05 INFO - PID 6957 | ###!!! [Child][MessageChannel] Error: (msgtype=0x3A0114,name=PContent::Msg_AccumulateChildHistograms) Channel closing: too late to send/recv, messages will be lost
[task 2021-02-16T22:18:05.950Z] 22:18:05 INFO - PID 6957 | ###!!! [Child][MessageChannel] Error: (msgtype=0x3A0119,name=PContent::Msg_RecordDiscardedData) Channel closing: too late to send/recv, messages will be lost
[task 2021-02-16T22:18:11.111Z] 22:18:11 INFO - ........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................
[task 2021-02-16T22:18:11.111Z] 22:18:11 INFO - TEST-OK | /dom/ranges/Range-insertNode.html | took 17299ms

See Also: → 1693639
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Component: DOM: Core & HTML → web-platform-tests
Product: Core → Testing

The wpt tsan failures have budged but remain at a high level. Please take another look.

Flags: needinfo?(kwright)

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #109)

The wpt tsan failures have budged but remain at a high level. Please take another look.

I've watched these for a few days now and it looks like the fix hasn't made a big enough change for what I had hoped. I can't work out any cause other than slow build. Tsan has a tendency to reorder tasks and I'm wondering if try is running into an existing startup hang at higher frequency that I haven't been able to reproduce on my local runs. Is there some way to crash and produce a stack trace when it hits this intermittent?

Flags: needinfo?(kwright)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

There are 226 total failures in the last 7 days on

  • linux1804-64-ccov opt
  • linux1804-64-tsan opt

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=331949312&repo=autoland&lineNumber=2535

[task 2021-03-03T21:10:53.662Z] 21:10:53 INFO - TEST-START | /encoding/streams/decode-attributes.any.html
[task 2021-03-03T21:10:53.670Z] 21:10:53 INFO - Closing window 47
[task 2021-03-03T21:11:38.657Z] 21:11:38 INFO - Got timeout in harness
[task 2021-03-03T21:11:38.660Z] 21:11:38 INFO - TEST-UNEXPECTED-TIMEOUT | /encoding/streams/decode-attributes.any.html | TestRunner hit external timeout (this may indicate a hang)
[task 2021-03-03T21:11:38.660Z] 21:11:38 INFO - TEST-INFO took 45003ms
[task 2021-03-03T21:28:18.700Z] 21:28:18 INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/web-platform/runtests.py', '--log-raw=-', '--log-raw=/builds/worker/workspace/build/blobber_upload_dir/wpt_raw.log', '--log-wptreport=/builds/worker/workspace/build/blobber_upload_dir/wptreport.json', '--log-errorsummary=/builds/worker/workspace/build/blobber_upload_dir/wpt_errorsummary.log', '--binary=/builds/worker/workspace/build/application/firefox/firefox', '--symbols-path=None', '--stackwalk-binary=/builds/worker/fetches/minidump_stackwalk/minidump_stackwalk', '--stackfix-dir=/builds/worker/workspace/build/tests/bin', '--no-pause-after-test', '--instrument-to-file=/builds/worker/workspace/build/blobber_upload_dir/wpt_instruments.txt', '--specialpowers-path=/builds/worker/workspace/build/tests/extensions/specialpowers@mozilla.org.xpi', '--processes=1', '--install-fonts', '--test-type=testharness', '--setpref=media.peerconnection.mtransport_process=false', '--setpref=network.process.enabled=false', '--skip-timeout', '--skip-implementation-status=backlog', '--skip-implementation-status=not-implementing', '--test-groups=/builds/worker/fetches/wpt_tests_by_group.json', '--prefs-root=/builds/worker/workspace/build/tests/web-platform/prefs', '--config=/builds/worker/workspace/build/tests/web-platform/wptrunner.ini', '--ca-cert-path=/builds/worker/workspace/build/tests/web-platform/tests/tools/certs/cacert.pem', '--host-key-path=/builds/worker/workspace/build/tests/web-platform/tests/tools/certs/web-platform.test.key', '--host-cert-path=/builds/worker/workspace/build/tests/web-platform/tests/tools/certs/web-platform.test.pem', '--certutil-binary=/builds/worker/workspace/build/tests/bin/certutil', '/infrastructure/expected-fail', '/css/css-syntax', '/css/css-lists/animations', '/css/css-transitions/parsing', '/encoding/streams', '/savedata', '/svg/struct', '/merchant-validation', '/web-bundle/subresource-loading', '/cookies/attributes', '/referrer-policy/4K-1/gen', '/upgrade-insecure-requests/gen/iframe-blank-inherit.meta', '/_mozilla/focus', '/magnetometer', '/permissions', '/webaudio/the-audio-api/the-mediastreamaudiodestinationnode-interface', '/fetch/cors-rfc1918', '/css/css-align', '/subresource-integrity', '/webrtc-priority', '/shadow-dom', '/clipboard-apis/detached-iframe', '/notifications', '/FileAPI/filelist-section', '/css/css-pseudo', '/document-policy/required-policy', '/wasm/jsapi/constructor', '/css/motion', '/css/css-fonts/parsing', '/content-security-policy/media-src', '/html/syntax/parsing', '/workers/semantics/reporting-errors', '/webvtt/parsing/cue-text-parsing', '/gyroscope', '/css/css-color/animation', '/html/infrastructure/safe-passing-of-structured-data', '/bluetooth/characteristic/writeValue']
[task 2021-03-03T21:28:18.701Z] 21:28:18 ERROR - timed out after 1000 seconds of no output
[task 2021-03-03T21:28:18.701Z] 21:28:18 ERROR - Return code: -15
[task 2021-03-03T21:28:18.702Z] 21:28:18 ERROR - Got 1 unexpected statuses
[task 2021-03-03T21:28:18.702Z] 21:28:18 ERROR - No suite end message was emitted by this harness.
[task 2021-03-03T21:28:18.702Z] 21:28:18 ERROR - # TBPL FAILURE #
[task 2021-03-03T21:28:18.703Z] 21:28:18 WARNING - setting return code to 2

Kris are you still working on this?

Flags: needinfo?(kwright)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

We are currently completely stumped on how to make this more stable after trying several different approaches with no success.

Flags: needinfo?(kwright)
Whiteboard: [stockwell disable-recommended]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

Unassigning myself as I've stopped looking into this, sheriffs should do whatever they think is best here

Assignee: a.beingessner → nobody

James, can you prioritize this very frequent failure? Or should wpt on tsan get turned off for now?

Flags: needinfo?(choller)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

Alexis, can you take a look here? It looks like frequency has increased again on WPT, there might be some new tests causing this. If we can't figure out the root cause, then we should disable tests in chunks to if we can find a working subset and then figure out which tests in the failing subset are responsible.

Flags: needinfo?(choller) → needinfo?(a.beingessner)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

Moving this to Kris. She will take a look and as a first measure, try to disable parts of WPT that are responsible for most of the failures on TSan. We will then try to identify single tests within those chunks that cause the issue. Unfortunately, so far it is not even clear if these are true hangs or just the machine becoming really slow. We also cannot reproduce any of this locally, so we might have to get a loaner.

Flags: needinfo?(a.beingessner) → needinfo?(kwright)

Test failures on autoland and central since start of March - has a long tail and doesn't include bug 1612600 because this doesn't mention a test name in the failure line.

Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

There are 408 total failures in the last 7 days, with an increase since March 26th: https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2021-03-21&endday=2021-03-28&tree=trunk&bug=1644147 on

[task 2021-03-28T13:49:47.998Z] 13:49:47 INFO - TEST-START | /custom-elements/reactions/Document.html
[task 2021-03-28T13:49:48.001Z] 13:49:48 INFO - Closing window 49
[task 2021-03-28T13:50:32.993Z] 13:50:32 INFO - Got timeout in harness
[task 2021-03-28T13:50:32.995Z] 13:50:32 INFO - TEST-UNEXPECTED-TIMEOUT | /custom-elements/reactions/Document.html | TestRunner hit external timeout (this may indicate a hang)
[task 2021-03-28T13:50:32.995Z] 13:50:32 INFO - TEST-INFO took 45002ms
[task 2021-03-28T14:07:13.025Z] 14:07:13 INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/web-platform/runtests.py', '--log-raw=-', '--log-raw=/builds/worker/workspace/build/blobber_upload_dir/wpt_raw.log', '--log-wptreport=/builds/worker/workspace/build/blobber_upload_dir/wptreport.json', '--log-errorsummary=/builds/worker/workspace/build/blobber_upload_dir/wpt_errorsummary.log', '--binary=/builds/worker/workspace/build/application/firefox/firefox', '--symbols-path=None', '--stackwalk-binary=/builds/worker/fetches/minidump_stackwalk/minidump_stackwalk', '--stackfix-dir=/builds/worker/workspace/build/tests/bin', '--no-pause-after-test', '--instrument-to-file=/builds/worker/workspace/build/blobber_upload_dir/wpt_instruments.txt', '--specialpowers-path=/builds/worker/workspace/build/tests/extensions/specialpowers@mozilla.org.xpi', '--processes=1', '--install-fonts', '--test-type=testharness', '--setpref=media.peerconnection.mtransport_process=false', '--setpref=network.process.enabled=false', '--skip-timeout', '--skip-implementation-status=backlog', '--skip-implementation-status=not-implementing', '--test-groups=/builds/worker/fetches/wpt_tests_by_group.json', '--prefs-root=/builds/worker/workspace/build/tests/web-platform/prefs', '--config=/builds/worker/workspace/build/tests/web-platform/wptrunner.ini', '--ca-cert-path=/builds/worker/workspace/build/tests/web-platform/tests/tools/certs/cacert.pem', '--host-key-path=/builds/worker/workspace/build/tests/web-platform/tests/tools/certs/web-platform.test.key', '--host-cert-path=/builds/worker/workspace/build/tests/web-platform/tests/tools/certs/web-platform.test.pem', '--certutil-binary=/builds/worker/workspace/build/tests/bin/certutil', '/content-security-policy/script-src-attr-elem', '/upgrade-insecure-requests/gen/worker-classic-data.meta', '/mathml/relations/html5-tree', '/is-input-pending', '/content-security-policy/default-src', '/fetch/api/cors', '/mixed-content/gen/top.meta', '/referrer-policy/gen/srcdoc-inherit.http-rp', '/web-animations', '/shape-detection', '/css', '/websockets/constructor', '/WebIDL/ecmascript-binding/es-exceptions', '/css/css-text/overflow-wrap', '/fetch/stale-while-revalidate', '/editing/other', '/selection', '/css/css-exclusions', '/css/css-will-change', '/xhr', '/webxr/gamepads-module', '/css/css-shapes/animation', '/html/rendering/widgets', '/svg/interact/parsing', '/bluetooth/server/getPrimaryServices', '/pointerevents/pointerlock', '/custom-elements/reactions', '/html/cross-origin-embedder-policy/credentialless', '/html/webappapis/system-state-and-capabilities', '/css/css-overflow/parsing', '/webaudio/the-audio-api/the-audiobuffersourcenode-interface', '/css/css-box/animation', '/touch-events', '/css/css-grid/subgrid', '/webcodecs', '/battery-status', '/origin-policy/ids']
[task 2021-03-28T14:07:13.027Z] 14:07:13 ERROR - timed out after 1000 seconds of no output
[task 2021-03-28T14:07:13.027Z] 14:07:13 ERROR - Return code: -15
[task 2021-03-28T14:07:13.028Z] 14:07:13 ERROR - Got 1 unexpected statuses
[task 2021-03-28T14:07:13.028Z] 14:07:13 ERROR - No suite end message was emitted by this harness.
[task 2021-03-28T14:07:13.028Z] 14:07:13 ERROR - # TBPL FAILURE #
[task 2021-03-28T14:07:13.028Z] 14:07:13 WARNING - setting return code to 2

Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
See Also: → 1701693

I went through and turned off high-volume failing tests on TSan in an effort to reduce intermittent volume (Bug 1701693). We can turn them back on if we figure out what's going on. Since these tests crashed more frequently than others, and a lot of the lower-failure tests do not seem to follow a particular pattern nor do any directories seem to be more problematic than others, I figure turning them off might reduce our failure volume to something more manageable.

However, if this doesn't work and just causes other tests to fail at the same volume then we will need to try a different avenue of investigation.

Flags: needinfo?(kwright)
Whiteboard: [stockwell disable-recommended]

The tests disable represented 273 out of 1531 failures based on comment 145 (with test failures from 721 still active tests). The wpt test framework will need to be adjusted.

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #157)

The tests disable represented 273 out of 1531 failures based on comment 145 (with test failures from 721 still active tests). The wpt test framework will need to be adjusted.

Reopened bug 1701693 to turn off more tests. I'm running into a similar timeout on TSan reftest pushes that fires off even if I set an unrealistically large timeout threshold so I have reason to believe there's an actual startup hang happening. TSan does tend to exacerbate existing hangs, so I'll be investigating this further.

Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

We have a fix for this now that we will test today on try (we have a patch that fixes a problem in TSan that is likely the cause for the hangs here).

Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

Update:

There have been 52 failures within the last 7 days:

  • 3 failures on Android 7.0 x86-64 opt
  • 1 failure on Android 7.0 x86-64 Shippable opt
  • 1 failure on Linux 18.04 shippable opt
  • 7 failures on Linux 18.04 x64 debug/opt
  • 15 failures on Linux 18.04 x64 asan opt
  • 2 failures on Linux 18.04 x64 WebRender opt
  • 1 failure on Linux 18.04 x64 WebRender Shippable opt
  • 2 failures on Linux 18.04 x64 tsan opt
  • 1 failure on Windows 10 x64 opt
  • 9 failures on Windows 10 x64 CCov WebRender opt
  • 6 failures on Windows 10 x64 WebRender opt
  • 4 failures on Windows 10 x64 WebRender Shippable opt

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=339130700&repo=autoland&lineNumber=10704

[task 2021-05-08T06:41:04.430Z] 06:41:04     INFO - PID 19760 | [Child 19967: Unnamed thread 61200049ffc0]: I/signaling [|WebrtcVideoSessionConduit] VideoStreamFactory.cpp:205: CreateEncoderStreams Input frame 640x480, RID  scaling to 640x480
[task 2021-05-08T06:41:04.430Z] 06:41:04     INFO - PID 19760 | [Child 19967: Socket Thread]: D/mtransport Trickle candidate is redundant for stream 'PC:{383b9ab9-d62d-4b87-a1bd-1540de8e345d} 1620456064172843 (id=8589934613 url=https://web-platform.test:8443/webrtc-encoded-transf transport-id=transport_0' because it is completed:
[task 2021-05-08T06:41:04.431Z] 06:41:04     INFO - PID 19760 | [Child 19967: Main Thread]: E/signaling [main|PeerConnectionImpl] PeerConnectionImpl.cpp:2771: GetStats: Found no pipelines matching selector.
[task 2021-05-08T06:41:04.431Z] 06:41:04     INFO - PID 19760 | [Child 19967: Main Thread]: E/signaling [main|PeerConnectionImpl] PeerConnectionImpl.cpp:2771: GetStats: Found no pipelines matching selector.
[task 2021-05-08T06:41:04.452Z] 06:41:04     INFO - PID 19760 | [Child 19967: Main Thread]: E/signaling [main|PeerConnectionImpl] PeerConnectionImpl.cpp:2771: GetStats: Found no pipelines matching selector.
[task 2021-05-08T06:41:04.455Z] 06:41:04     INFO - PID 19760 | [Child 19967: Main Thread]: I/signaling [main|PeerConnectionImpl] PeerConnectionImpl.cpp:2102: CloseInt: Closing PeerConnectionImpl {e675a90d-53d6-4937-b636-95cadd22198f}; ending call
[task 2021-05-08T06:41:04.456Z] 06:41:04     INFO - PID 19760 | [Child 19967: Main Thread]: I/jsep [{e675a90d-53d6-4937-b636-95cadd22198f} 1620456064171447 (id=8589934613 url=https://web-platform.test:8443/webrtc-encoded-transf]: stable -> closed
[task 2021-05-08T06:41:04.465Z] 06:41:04     INFO - PID 19760 | [Child 19967: Main Thread]: E/signaling [main|PeerConnectionImpl] PeerConnectionImpl.cpp:2771: GetStats: Found no pipelines matching selector.
[task 2021-05-08T06:41:04.465Z] 06:41:04     INFO - PID 19760 | [Child 19967: Main Thread]: I/signaling [main|PeerConnectionImpl] PeerConnectionImpl.cpp:2102: CloseInt: Closing PeerConnectionImpl {383b9ab9-d62d-4b87-a1bd-1540de8e345d}; ending call
[task 2021-05-08T06:41:04.466Z] 06:41:04     INFO - PID 19760 | [Child 19967: Main Thread]: I/jsep [{383b9ab9-d62d-4b87-a1bd-1540de8e345d} 1620456064172843 (id=8589934613 url=https://web-platform.test:8443/webrtc-encoded-transf]: stable -> closed
[task 2021-05-08T06:41:46.907Z] 06:41:46     INFO - Got timeout in harness
[task 2021-05-08T06:41:46.909Z] 06:41:46     INFO - TEST-UNEXPECTED-TIMEOUT | /webrtc-encoded-transform/RTCPeerConnection-insertable-streams-legacy.https.html | TestRunner hit external timeout (this may indicate a hang)
[task 2021-05-08T06:41:46.909Z] 06:41:46     INFO - TEST-INFO took 45003ms
[task 2021-05-08T06:42:57.996Z] 06:42:57     INFO - Browser exited with return code -15
[task 2021-05-08T06:42:57.996Z] 06:42:57     INFO - PROCESS LEAKS None
[task 2021-05-08T06:42:58.042Z] 06:42:58     INFO - Browser not responding, setting status to CRASH
Whiteboard: [stockwell disable-recommended] → [stockwell disable-recommended][stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended][stockwell needswork:owner] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Flags: needinfo?(james)

Hi James, recent failures here are mostly on android, can you please take a look?

There are 124 total failures in the last 7 days on

  • android-em-7-0-x86_64 opt and debug
  • android-em-7-0-x86_64-qr debug and opt
  • linux1804-64 opt and debug
  • linux1804-64-asan opt
  • windows10-64 opt
  • windows10-64-ccov-qr opt
  • windows10-64-qr opt

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=339534992&repo=mozilla-central&lineNumber=3143

[task 2021-05-12T18:34:07.325Z] 18:34:07 INFO - TEST-START | /html/cross-origin-opener-policy/iframe-popup-same-origin-to-unsafe-none.https.html
[task 2021-05-12T18:34:07.335Z] 18:34:07 INFO - Closing window 484
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - Traceback (most recent call last):
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 791, in run_func
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - self.result = True, self.func(self.protocol, self.url, self.timeout)
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 912, in do_testharness
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - protocol.marionette.navigate(url)
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/marionette.py", line 1516, in navigate
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - self._send_message("WebDriver:Navigate", {"url": url})
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/decorators.py", line 27, in _
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - return func(*args, **kwargs)
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/marionette.py", line 621, in _send_message
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - msg = self.client.request(name, params)
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/transport.py", line 292, in request
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - return self.receive()
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/transport.py", line 178, in receive
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - if len(remaining) == int(length):
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING - ValueError: invalid literal for int() with base 10: b',{"error"'
[task 2021-05-12T18:37:17.340Z] 18:37:17 WARNING -
[task 2021-05-12T18:37:22.327Z] 18:37:22 INFO - Got timeout in harness
[task 2021-05-12T18:37:22.472Z] 18:37:22 INFO - TEST-UNEXPECTED-TIMEOUT | /html/cross-origin-opener-policy/iframe-popup-same-origin-to-unsafe-none.https.html | TestRunner hit external timeout (this may indicate a hang)
[task 2021-05-12T18:37:22.472Z] 18:37:22 INFO - TEST-INFO took 195146ms
[task 2021-05-12T18:37:22.898Z] 18:37:22 INFO - STDOUT: cleanup aborted: Unable to remount device
[task 2021-05-12T18:37:23.045Z] 18:37:23 INFO - STDOUT: cleanup aborted: Unable to remount device
[task 2021-05-12T18:37:33.053Z] 18:37:33 WARNING - Forcibly terminating runner process
[task 2021-05-12T18:37:33.076Z] 18:37:33 INFO - Setting up ssl
[task 2021-05-12T18:37:33.096Z] 18:37:33 INFO - certutil | b''
[task 2021-05-12T18:37:33.118Z] 18:37:33 INFO - certutil | b''
[task 2021-05-12T18:37:33.140Z] 18:37:33 INFO - certutil | b'\nCertificate Nickname Trust Attributes\n SSL,S/MIME,JAR/XPI\n\nweb-platform-tests CT,, \n'
[task 2021-05-12T18:37:34.161Z] 18:37:34 INFO - adb Granting important runtime permissions to org.mozilla.geckoview.test
[task 2021-05-12T18:37:35.597Z] 18:37:35 INFO - adb launch_application: am start -W -n org.mozilla.geckoview.test/org.mozilla.geckoview.test.TestRunnerActivity -a android.intent.action.MAIN --es env0 MOZ_CRASHREPORTER=1 --es env1 MOZ_CRASHREPORTER_NO_REPORT=1 --es env2 MOZ_CRASHREPORTER_SHUTDOWN=1 --es env3 MOZ_HIDE_RESULTS_TABLE=1 --es env4 MOZ_IN_AUTOMATION=1 --es env5 MOZ_LOG=signaling:3,mtransport:4,DataChannel:4,jsep:4 --es env6 R_LOG_LEVEL=6 --es env7 R_LOG_DESTINATION=stderr --es env8 R_LOG_VERBOSE=1 --es env9 MOZ_PROCESS_LOG=/tmp/tmpscw8q_fepidlog --es env10 MOZ_DISABLE_NONLOCAL_CONNECTIONS=1 --es env11 STYLO_THREADS=1 --es env12 MOZ_WEBRENDER=1 --es arg0 -no-remote --es arg1 -profile --es arg2 /data/local/tmp/test_root/profile --es arg3 --marionette --es arg4 about:blank --ez use_multiprocess True
[task 2021-05-12T18:37:36.576Z] 18:37:36 INFO - Starting runner

Flags: needinfo?(james)

I had a quick look and as it looks like it's related to the Marionette client's transport layer:

[task 2021-05-12T18:37:17.340Z] 18:37:17  WARNING -   File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/transport.py", line 292, in request
[task 2021-05-12T18:37:17.340Z] 18:37:17  WARNING -     return self.receive()
[task 2021-05-12T18:37:17.340Z] 18:37:17  WARNING -   File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/transport.py", line 178, in receive
[task 2021-05-12T18:37:17.340Z] 18:37:17  WARNING -     if len(remaining) == int(length):
[task 2021-05-12T18:37:17.340Z] 18:37:17  WARNING - ValueError: invalid literal for int() with base 10: b',{"error"'

As such the switch from Python 2.7 to 3.6 might have caused this failure to appear.

There are 91 total failures in the last 7 days on

[task 2021-05-22T17:11:41.173Z] 17:11:41 INFO - TEST-START | /html/cross-origin-opener-policy/popup-same-origin-non-initial-about-blank.https.html
[task 2021-05-22T17:11:41.269Z] 17:11:41 INFO - Closing window f84b64ba-fb2a-41d4-8e34-8f44c494c18b
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - Traceback (most recent call last):
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 790, in run_func
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - self.result = True, self.func(self.protocol, self.url, self.timeout)
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - File "/builds/worker/workspace/build/tests/web-platform/tests/tools/wptrunner/wptrunner/executors/executormarionette.py", line 911, in do_testharness
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - protocol.marionette.navigate(url)
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/marionette.py", line 1516, in navigate
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - self._send_message("WebDriver:Navigate", {"url": url})
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/decorators.py", line 27, in _
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - return func(*args, **kwargs)
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/marionette.py", line 621, in _send_message
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - msg = self.client.request(name, params)
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/transport.py", line 292, in request
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - return self.receive()
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/transport.py", line 178, in receive
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - if len(remaining) == int(length):
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - ValueError: invalid literal for int() with base 10: b',{"error"'
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING -
[task 2021-05-22T17:12:36.175Z] 17:12:36 INFO - Got timeout in harness
[task 2021-05-22T17:12:36.273Z] 17:12:36 INFO - TEST-UNEXPECTED-TIMEOUT | /html/cross-origin-opener-policy/popup-same-origin-non-initial-about-blank.https.html | TestRunner hit external timeout (this may indicate a hang)
[task 2021-05-22T17:12:36.273Z] 17:12:36 INFO - TEST-INFO took 55100ms
[task 2021-05-22T17:12:36.639Z] 17:12:36 INFO - STDOUT: cleanup aborted: Unable to remount device
[task 2021-05-22T17:12:36.748Z] 17:12:36 INFO - STDOUT: cleanup aborted: Unable to remount device
[task 2021-05-22T17:12:36.750Z] 17:12:36 INFO - Browser not responding, setting status to CRASH
[task 2021-05-22T17:12:36.751Z] 17:12:36 WARNING - Command left in command_queue during cleanup: 'test_ended', (<wptrunner.wpttest.TestharnessTest /html/cross-origin-opener-policy/popup-same-origin-non-initial-about-blank.https.html>, (<wptrunner.wpttest.TestharnessResult CRASH>, []))
[task 2021-05-22T17:12:36.752Z] 17:12:36 INFO - Closing logging queue
[task 2021-05-22T17:12:36.752Z] 17:12:36 INFO - queue closed
[task 2021-05-22T17:12:36.779Z] 17:12:36 INFO - Setting up ssl
[task 2021-05-22T17:12:36.795Z] 17:12:36 INFO - certutil | b''
[task 2021-05-22T17:12:36.818Z] 17:12:36 INFO - certutil | b''
[task 2021-05-22T17:12:36.838Z] 17:12:36 INFO - certutil | b'\nCertificate Nickname Trust Attributes\n SSL,S/MIME,JAR/XPI\n\nweb-platform-tests CT,, \n'
[task 2021-05-22T17:12:37.678Z] 17:12:37 INFO - adb Granting important runtime permissions to org.mozilla.geckoview.test
[task 2021-05-22T17:12:39.105Z] 17:12:39 INFO - adb launch_application: am start -W -n org.mozilla.geckoview.test/org.mozilla.geckoview.test.TestRunnerActivity -a android.intent.action.MAIN --es env0 MOZ_CRASHREPORTER=1 --es env1 MOZ_CRASHREPORTER_NO_REPORT=1 --es env2 MOZ_CRASHREPORTER_SHUTDOWN=1 --es env3 MOZ_HIDE_RESULTS_TABLE=1 --es env4 MOZ_IN_AUTOMATION=1 --es env5 MOZ_LOG=signaling:3,mtransport:4,DataChannel:4,jsep:4 --es env6 R_LOG_LEVEL=6 --es env7 R_LOG_DESTINATION=stderr --es env8 R_LOG_VERBOSE=1 --es env9 MOZ_PROCESS_LOG=/tmp/tmpxu7c1591pidlog --es env10 MOZ_DISABLE_NONLOCAL_CONNECTIONS=1 --es env11 STYLO_THREADS=1 --es env12 MOZ_WEBRENDER=0 --es arg0 -no-remote --es arg1 -profile --es arg2 /data/local/tmp/test_root/profile --es arg3 --marionette --es arg4 about:blank --ez use_multiprocess True
[task 2021-05-22T17:12:39.983Z] 17:12:39 INFO - Starting runner

Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Flags: needinfo?(james)

James, could you please have a look why that can happen in Marionette client? It's only visible for wpt tests:

[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/transport.py", line 178, in receive
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - if len(remaining) == int(length):
[task 2021-05-22T17:12:31.228Z] 17:12:31 WARNING - ValueError: invalid literal for int() with base 10: b',{"error"'
Flags: needinfo?(james)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

There are 104 total failures in the last 7 days on

  • android-em-7-0-x86_64 debug and opt
  • android-em-7-0-x86_64-qr debug
  • windows10-32 opt
  • windows10-64 debug and opt
  • windows10-64-ccov-qr opt

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=341190155&repo=autoland&lineNumber=18749

[task 2021-05-30T01:23:09.600Z] 01:23:09 INFO - TEST-START | /encoding/legacy-mb-tchinese/big5/big5-encode-form-x-x-big5.html?14001-last
[task 2021-05-30T01:23:09.601Z] 01:23:09 INFO - Closing window d1f3d2dd-b80e-4dd5-9716-80cf0c6d7608
[task 2021-05-30T01:23:10.009Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.039Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.073Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.092Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.110Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.124Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.145Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.161Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.179Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.197Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.216Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.234Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.289Z] 01:23:10 INFO - PID 8312 | [Child 6392, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.852Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.866Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.920Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.946Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.956Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:10.972Z] 01:23:10 INFO - PID 8312 | [Child 10052, Main Thread] WARNING: early callback, or time went backwards: '!aAllowIdleDispatch', file /builds/worker/checkouts/gecko/xpcom/threads/IdleTaskRunner.cpp:179
[task 2021-05-30T01:23:11.463Z] 01:23:11 INFO - Got timeout in harness
[task 2021-05-30T01:23:11.465Z] 01:23:11 INFO - TEST-UNEXPECTED-TIMEOUT | /html/cross-origin-opener-policy/no-https.html | TestRunner hit external timeout (this may indicate a hang)
[task 2021-05-30T01:23:11.466Z] 01:23:11 INFO - TEST-INFO took 195004ms
[task 2021-05-30T01:23:11.777Z] 01:23:11 INFO - ..........................................................................................................................................................................................................................................................................................................................................................................
[task 2021-05-30T01:23:11.777Z] 01:23:11 INFO - TEST-OK | /encoding/legacy-mb-tchinese/big5/big5-encode-form-x-x-big5.html?14001-last | took 2178ms

Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Depends on: 1716963

This bug seems to be being used as a bit of a dumping ground for any wpt intermittent timeout. The initial TSAN issue was mostly a single problem, but now I don't know that there's any specific relationship between all the different failures. It would be more helpful to have one bug per test or group of tests.

Flags: needinfo?(james)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Summary: Intermittent TEST-UNEXPECTED-TIMEOUT various wpt core tests | TestRunner hit external timeout (this may indicate a hang) → Intermittent TEST-UNEXPECTED-TIMEOUT various wpt core tests | TestRunner hit external timeout (this may indicate a hang) DO NOT USE FOR CLASSIFICATIOn
Summary: Intermittent TEST-UNEXPECTED-TIMEOUT various wpt core tests | TestRunner hit external timeout (this may indicate a hang) DO NOT USE FOR CLASSIFICATIOn → Intermittent TEST-UNEXPECTED-TIMEOUT various wpt core tests | TestRunner hit external timeout (this may indicate a hang) DO NOT USE FOR CLASSIFICATION

(In reply to James Graham [:jgraham] from comment #218)

This bug seems to be being used as a bit of a dumping ground for any wpt intermittent timeout. The initial TSAN issue was mostly a single problem, but now I don't know that there's any specific relationship between all the different failures. It would be more helpful to have one bug per test or group of tests.

We will file individual bugs from now. Thank you.

Whiteboard: [stockwell disable-recommended]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: