Closed Bug 1845961 Opened 1 year ago Closed 1 year ago

High freq Windows wpt timeouts | Task aborted - max run time exceeded

Categories

(Testing :: web-platform-tests, defect, P5)

defect

Tracking

(firefox119 fixed)

RESOLVED FIXED
119 Branch
Tracking Status
firefox119 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jmaher, NeedInfo)

References

Details

(Keywords: intermittent-failure, intermittent-testcase, Whiteboard: [stockwell disable-recommended], [wptsync upstream])

Attachments

(1 file)

There is a high number of wpt test that fail with timeout.
Happens on central and beta-sim, autoland doesn't seem to be affected

Failure log #1

Failure log #2

TEST-START | /websockets/binaryType-wrong-value.any.html?wpt_flags=h2
[task 2023-07-28T11:41:22.066Z] 11:41:22     INFO - Closing window ce0570f1-4680-4bb7-97e3-962d3188575f
[task 2023-07-28T11:41:22.174Z] 11:41:22     INFO - PID 9572 | [Child 2172, Main Thread] WARNING: NS_ENSURE_TRUE(!mHasOrHasHadOwnerWindow || mOwnerWindow) failed: file /builds/worker/checkouts/gecko/dom/events/DOMEventTargetHelper.cpp:302
[task 2023-07-28T11:41:22.221Z] 11:41:22     INFO - PID 9572 | [Child 2172, Main Thread] WARNING: IPC message 'PContent::Msg_CleanupPendingLoadState' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:508
[task 2023-07-28T11:41:22.229Z] 11:41:22     INFO - PID 9572 | [Child 2172, Main Thread] WARNING: IPC message 'PContent::Msg_CleanupPendingLoadState' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:508
[task 2023-07-28T11:41:22.229Z] 11:41:22     INFO - PID 9572 | [Child 2172, Main Thread] WARNING: IPC message 'PContent::Msg_CleanupPendingLoadState' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:508
[task 2023-07-28T11:41:22.230Z] 11:41:22     INFO - PID 9572 | [Child 2172, Main Thread] WARNING: IPC message 'PContent::Msg_CleanupPendingLoadState' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:508
[task 2023-07-28T11:41:22.231Z] 11:41:22     INFO - PID 9572 | [Child 2172, Main Thread] WARNING: IPC message 'PContent::Msg_CleanupPendingLoadState' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:508
[task 2023-07-28T11:41:22.347Z] 11:41:22  WARNING - wptserve (646560d6) Unexpected Error - 
[task 2023-07-28T11:41:22.347Z] 11:41:22  WARNING - Max outbound streams is 100, 0 open
[task 2023-07-28T11:41:22.362Z] 11:41:22     INFO - Test harness output was not a valid structured log message
[task 2023-07-28T11:41:22.362Z] 11:41:22     INFO - Exception in thread Thread-1577:
[task 2023-07-28T11:41:22.362Z] 11:41:22     INFO - Traceback (most recent call last):
[task 2023-07-28T11:41:22.362Z] 11:41:22     INFO -   File "Z:\task_169053280710932\build\tests\web-platform\tests\tools\third_party\h2\h2\connection.py", line 224, in process_input
[task 2023-07-28T11:41:22.363Z] 11:41:22     INFO -     func, target_state = self._transitions[(self.state, input_)]
[task 2023-07-28T11:41:22.364Z] 11:41:22     INFO - KeyError: (<ConnectionState.CLOSED: 3>, <ConnectionInputs.SEND_HEADERS: 0>)
[task 2023-07-28T11:41:22.364Z] 11:41:22     INFO - During handling of the above exception, another exception occurred:
[task 2023-07-28T11:41:22.364Z] 11:41:22     INFO - Traceback (most recent call last):
[task 2023-07-28T11:41:22.364Z] 11:41:22     INFO -   File "c:\mozilla-build\python3\lib\threading.py", line 973, in _bootstrap_inner
[task 2023-07-28T11:41:22.364Z] 11:41:22     INFO -     self.run()
[task 2023-07-28T11:41:22.364Z] 11:41:22     INFO -   File "c:\mozilla-build\python3\lib\threading.py", line 910, in run
[task 2023-07-28T11:41:22.365Z] 11:41:22     INFO -     self._target(*self._args, **self._kwargs)
[task 2023-07-28T11:41:22.366Z] 11:41:22     INFO -   File "Z:\task_169053280710932\build\tests\web-platform\tests\tools\wptserve\wptserve\server.py", line 626, in _stream_thread
[task 2023-07-28T11:41:22.368Z] 11:41:22     INFO -     self.finish_handling(request, response, req_handler)
[task 2023-07-28T11:41:22.368Z] 11:41:22     INFO -   File "Z:\task_169053280710932\build\tests\web-platform\tests\tools\wptserve\wptserve\server.py", line 301, in finish_handling
[task 2023-07-28T11:41:22.370Z] 11:41:22     INFO -     response.write()
[task 2023-07-28T11:41:22.370Z] 11:41:22     INFO -   File "Z:\task_169053280710932\build\tests\web-platform\tests\tools\wptserve\wptserve\response.py", line 239, in write
[task 2023-07-28T11:41:22.370Z] 11:41:22     INFO -     self.write_status_headers()
[task 2023-07-28T11:41:22.371Z] 11:41:22     INFO -   File "Z:\task_169053280710932\build\tests\web-platform\tests\tools\wptserve\wptserve\response.py", line 430, in write_status_headers
[task 2023-07-28T11:41:22.372Z] 11:41:22     INFO -     self.writer.write_headers(self.headers, *self.status)
[task 2023-07-28T11:41:22.372Z] 11:41:22     INFO -   File "Z:\task_169053280710932\build\tests\web-platform\tests\tools\wptserve\wptserve\response.py", line 494, in write_headers
[task 2023-07-28T11:41:22.373Z] 11:41:22     INFO -     connection.send_headers(
[task 2023-07-28T11:41:22.374Z] 11:41:22     INFO -   File "Z:\task_169053280710932\build\tests\web-platform\tests\tools\third_party\h2\h2\connection.py", line 766, in send_headers
[task 2023-07-28T11:41:22.375Z] 11:41:22     INFO -     self.state_machine.process_input(ConnectionInputs.SEND_HEADERS)
[task 2023-07-28T11:41:22.375Z] 11:41:22     INFO -   File "Z:\task_169053280710932\build\tests\web-platform\tests\tools\third_party\h2\h2\connection.py", line 228, in process_input
INFO -     raise ProtocolError(
[task 2023-07-28T11:41:22.377Z] 11:41:22     INFO - h2.exceptions.ProtocolError: Invalid input ConnectionInputs.SEND_HEADERS in state ConnectionState.CLOSED
[task 2023-07-28T11:41:22.383Z] 11:41:22     INFO - Exception in thread Thread-864:
[task 2023-07-28T11:41:22.383Z] 11:41:22     INFO - Traceback (most recent call last):
[task 2023-07-28T11:41:22.383Z] 11:41:22     INFO -   File "Z:\task_169053280710932\build\tests\web-platform\tests\tools\third_party\h2\h2\connection.py", line 224, in process_input
[task 2023-07-28T11:41:22.384Z] 11:41:22  WARNING - wptserve (646560d6 - 519) Stream Reset, Thread Closing
[task 2023-07-28T11:41:22.384Z] 11:41:22     INFO - Test harness output was not a valid structured log message
[task 2023-07-28T11:41:22.384Z] 11:41:22     INFO -     func, target_state = self._transitions[(self.state, input_)]
[task 2023-07-28T11:41:22.384Z] 11:41:22     INFO - KeyError: (<ConnectionState.CLOSED: 3>, <ConnectionInputs.SEND_DATA: 2>)
[task 2023-07-28T11:41:22.384Z] 11:41:22     INFO - During handling of the above exception, another exception occurred:
[task 2023-07-28T11:41:22.384Z] 11:41:22     INFO - Traceback (most recent call last):
[task 2023-07-28T11:41:22.384Z] 11:41:22     INFO -   File "c:\mozilla-build\python3\lib\threading.py", line 973, in _bootstrap_inner
[task 2023-07-28T11:41:22.385Z] 11:41:22     INFO -     self.run()
[task 2023-07-28T11:41:22.385Z] 11:41:22     INFO -   File "c:\mozilla-build\python3\lib\threading.py", line 910, in run
[task 2023-07-28T11:41:22.385Z] 11:41:22     INFO -     self._target(*self._args, **self._kwargs)
[task 2023-07-28T11:41:22.385Z] 11:41:22     INFO -   File "Z:\task_169053280710932\build\tests\web-platform\tests\tools\wptserve\wptserve\server.py", line 546, in _stream_ws_sub_thread
[task 2023-07-28T11:41:22.385Z] 11:41:22     INFO -     connection.end_stream(stream_id)
[task 2023-07-28T11:41:22.385Z] 11:41:22     INFO -   File "Z:\task_169053280710932\build\tests\web-platform\tests\tools\third_party\h2\h2\connection.py", line 882, in end_stream
[task 2023-07-28T11:41:22.386Z] 11:41:22     INFO -     self.state_machine.process_input(ConnectionInputs.SEND_DATA)
[task 2023-07-28T11:41:22.386Z] 11:41:22     INFO -   File "Z:\task_169053280710932\build\tests\web-platform\tests\tools\third_party\h2\h2\connection.py", line 228, in process_input
[task 2023-07-28T11:41:22.386Z] 11:41:22     INFO -     raise ProtocolError(
[task 2023-07-28T11:41:22.386Z] 11:41:22     INFO - h2.exceptions.ProtocolError: Invalid input ConnectionInputs.SEND_DATA in state ConnectionState.CLOSED
<....>
WARNING - wptserve (7fe40bbd) Unexpected Error - 
[task 2023-07-28T11:41:31.827Z] 11:41:31  WARNING - Invalid HTTP/2 preamble.
[task 2023-07-28T11:41:31.827Z] 11:41:31     INFO - PID 9572 | [Parent 3552, Main Thread] WARNING: IPC message 'PBrowser::Msg_UpdateNativeWindowHandle' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:508
[task 2023-07-28T11:41:31.827Z] 11:41:31  WARNING - wptserve (8b46d079) Unexpected Error - 
[task 2023-07-28T11:41:31.827Z] 11:41:31  WARNING - Invalid HTTP/2 preamble.
[task 2023-07-28T11:41:31.827Z] 11:41:31     INFO - PID 9572 | [Parent 3552, Main Thread] WARNING: IPC message 'PBrowser::Msg_Destroy' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:508
[task 2023-07-28T11:41:31.827Z] 11:41:31  WARNING - wptserve (404e0366) Unexpected Error - 
[task 2023-07-28T11:41:31.827Z] 11:41:31  WARNING - Invalid HTTP/2 preamble.
[task 2023-07-28T11:41:31.827Z] 11:41:31     INFO - PID 9572 | [Parent 3552, Main Thread] WARNING: No build ID mismatch: file /builds/worker/checkouts/gecko/dom/base/nsFrameLoader.cpp:3777
[task 2023-07-28T11:41:31.827Z] 11:41:31  WARNING - wptserve (14109c2d) Unexpected Error - 
[task 2023-07-28T11:41:31.831Z] 11:41:31  WARNING - Invalid HTTP/2 preamble.
[task 2023-07-28T11:41:31.832Z] 11:41:31     INFO - PID 9572 | [Parent 3552, Main Thread] WARNING: IPC message 'PBrowser::Msg_UpdateNativeWindowHandle' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:508
[task 2023-07-28T11:41:31.832Z] 11:41:31     INFO - IOError on command, setting status to CRASH
[task 2023-07-28T11:41:31.832Z] 11:41:31     INFO - PID 9572 | [Parent 3552, Main Thread] WARNING: IPC message 'PBrowser::Msg_Destroy' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:508
[task 2023-07-28T11:41:31.832Z] 11:41:31  WARNING - wptserve (1f458e9c) Unexpected Error - 
[task 2023-07-28T11:41:31.832Z] 11:41:31  WARNING - Invalid HTTP/2 preamble.
[task 2023-07-28T11:41:31.832Z] 11:41:31     INFO - PID 9572 | [Parent 3552, ProcessHangMon] WARNING: IPC message 'PProcessHangMonitor::Msg_RequestContentJSInterrupt' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:508
[task 2023-07-28T11:41:31.839Z] 11:41:31  WARNING - wptserve (2c70c7c2) Unexpected Error - 
[task 2023-07-28T11:41:31.839Z] 11:41:31  WARNING - Invalid HTTP/2 preamble.
[task 2023-07-28T11:41:31.839Z] 11:41:31     INFO - PID 9572 | [Parent 3552, Main Thread] WARNING: No build ID mismatch: file /builds/worker/checkouts/gecko/dom/base/nsFrameLoader.cpp:3777
[task 2023-07-28T11:41:31.839Z] 11:41:31  WARNING - wptserve (8093a41a) Unexpected Error - 
[task 2023-07-28T11:41:31.839Z] 11:41:31  WARNING - Invalid HTTP/2 preamble.
[task 2023-07-28T11:41:31.843Z] 11:41:31     INFO - PID 9572 | A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down
[task 2023-07-28T11:41:31.844Z] 11:41:31     INFO - TEST-UNEXPECTED-CRASH | /websockets/binaryType-wrong-value.any.html?wpt_flags=h2 | expected TIMEOUT
[task 2023-07-28T11:41:31.844Z] 11:41:31     INFO - TEST-INFO expected TIMEOUT | took 9779ms
[task 2023-07-28T11:41:31.844Z] 11:41:31  WARNING - wptserve (a3f0eec6) Unexpected Error - 
Flags: needinfo?(james)
Summary: High freq wpt timeouts → High freq wpt timeouts | Task aborted - max run time exceeded
Summary: High freq wpt timeouts | Task aborted - max run time exceeded → High freq Windows wpt timeouts | Task aborted - max run time exceeded

This has been happening for a while now but it's been hiding under meta Bug 1809667.
Unfortunately an increase in number of chunks from 16->20 for that platform won't work this time around as seen in this try push. wpt11 runs are borderline to the max run time with them being around 118-120 mins.

Before this happening again (because it happened also in Bug 1821918) these runs were around 112-116 mins long as seen in this range. Something in this merge made them run overboard.

The common manifests between those two chunk iterations are:

    /IndexedDB
    /media-source/dedicated-worker
    /pointerevents
    /shadow-dom/focus
    /uievents/legacy
    /upgrade-insecure-requests/gen
    /webaudio/the-audio-api/the-channelmergernode-interface
    /webxr/dom-overlay

When comparing the manifests from the green runs before those central changes and the ones from the max run time I get the same above list of manifests. So adding these ones was what made them run overboard.
I don't understand why these won't chunk when increasing from 16 to 20 chunks.

Joel, any ideas what to do in this case as increasing the number of chunks will not solve this. Thank you.

Flags: needinfo?(jmaher)

good news this is windows debug, and I think we can find tests that have long runtimes (i.e. expected TIMEOUT) and skip them on windows/debug.

I see 61 tests timing out in a random log I clicked on from the Intermittent Failure view in treeherder, these are all 30-180 seconds each, that alone accounts for almost 60 minutes of runtime.

Flags: needinfo?(jmaher)

How do we decide which manifest to skip on windows debug? Should we take just a few of them?
/pointerevents/ and /soft-navigation-heuristics folders seem to have the most tests expected to timeout. Who calls the shots here?

Flags: needinfo?(jmaher)

I am testing a patch on try now, those two directories you mentioned are the biggest wins; stay tuned for an update!

Flags: needinfo?(jmaher)
Assignee: nobody → jmaher
Status: NEW → ASSIGNED
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ce73f600c073 Skip timeout/crash for normal wpt runs, keep running on periodic backlog runs. r=aryx
Failed to create upstream wpt PR due to merge conflicts. This requires fixup from a wpt sync admin.
Flags: needinfo?(james) → needinfo?(aborovova)
Whiteboard: [stockwell disable-recommended] → [stockwell disable-recommended], [wptsync upstream error]
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 119 Branch
Flags: needinfo?(aborovova) → needinfo?(james)
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/41798 for changes under testing/web-platform/tests
Whiteboard: [stockwell disable-recommended], [wptsync upstream error] → [stockwell disable-recommended], [wptsync upstream]
Upstream PR was closed without merging
Upstream PR was closed without merging
Upstream PR merged by moz-wptsync-bot
Pushed by wptsync@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f4d8d5f80cd5 [wpt PR 41798] - [Gecko Bug 1845961] Skip timeout/crash for normal wpt runs, keep running on periodic backlog runs., a=testonly
See Also: → 1926731
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: