Open Bug 1878934 Opened 9 months ago Updated 1 day ago

Intermittent /webdriver/tests/bidi/network/continue_response/invalid.py | single tracking bug

Categories

(Remote Protocol :: WebDriver BiDi, defect, P5)

defect

Tracking

(Not tracked)

REOPENED

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, intermittent-testcase)

Filed by: pstanciu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=446151296&repo=try
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/CCPBOFudRNKgrwPZnaC2xw/runs/0/artifacts/public/logs/live_backing.log


[task 2024-02-06T14:34:57.284Z] 14:34:57     INFO - PID 16135 | 1707230097282	WebDriver BiDi	DEBUG	Connection bfbc4810-2bbe-4ae2-b53c-70b33e710985 attached to session c0ea3582-54e4-44cc-812c-dd991d7c729e
[task 2024-02-06T14:34:57.285Z] 14:34:57     INFO - PID 16135 | 1707230097283	webdriver::server	DEBUG	-> GET /session/c0ea3582-54e4-44cc-812c-dd991d7c729e/window/rect
[task 2024-02-06T14:34:57.285Z] 14:34:57     INFO - PID 16135 | 1707230097283	Marionette	DEBUG	0 -> [0,2131,"WebDriver:GetWindowRect",{}]
[task 2024-02-06T14:34:57.288Z] 14:34:57     INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/bidi/network/continue_response/invalid.py | expected OK
[task 2024-02-06T14:34:57.288Z] 14:34:57     INFO - TEST-INFO took 30003ms
[task 2024-02-06T14:34:57.298Z] 14:34:57     INFO - PID 16135 | 1707230097285	Marionette	DEBUG	0 <- [1,2131,null,{"x":100,"y":100,"width":800,"height":600}]
[task 2024-02-06T14:34:57.299Z] 14:34:57     INFO - PID 16135 | 1707230097285	webdriver::server	DEBUG	<- 200 OK {"value":{"x":100,"y":100,"width":800,"height":600}}
[task 2024-02-06T14:34:57.555Z] 14:34:57     INFO - Test harness output was not a valid structured log message
[task 2024-02-06T14:34:57.555Z] 14:34:57     INFO - Exception ignored in: <coroutine object WebSocketCommonProtocol.close_connection at 0x7f520e0dd740>
[task 2024-02-06T14:34:57.556Z] 14:34:57     INFO - Traceback (most recent call last):
[task 2024-02-06T14:34:57.556Z] 14:34:57     INFO -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/third_party/websockets/src/websockets/legacy/protocol.py", line 1329, in close_connection
[task 2024-02-06T14:34:57.556Z] 14:34:57     INFO -     await self.close_transport()
[task 2024-02-06T14:34:57.556Z] 14:34:57     INFO -   File "/builds/worker/workspace/build/tests/web-platform/tests/tools/third_party/websockets/src/websockets/legacy/protocol.py", line 1345, in close_transport
[task 2024-02-06T14:34:57.557Z] 14:34:57     INFO -     self.transport.close()
[task 2024-02-06T14:34:57.557Z] 14:34:57     INFO -   File "/usr/lib/python3.8/asyncio/selector_events.py", line 680, in close
[task 2024-02-06T14:34:57.557Z] 14:34:57     INFO -     self._loop.call_soon(self._call_connection_lost, None)
[task 2024-02-06T14:34:57.557Z] 14:34:57     INFO -   File "/usr/lib/python3.8/asyncio/base_events.py", line 711, in call_soon
[task 2024-02-06T14:34:57.558Z] 14:34:57     INFO -     self._check_closed()
[task 2024-02-06T14:34:57.558Z] 14:34:57     INFO -   File "/usr/lib/python3.8/asyncio/base_events.py", line 504, in _check_closed
[task 2024-02-06T14:34:57.558Z] 14:34:57     INFO -     raise RuntimeError('Event loop is closed')
[task 2024-02-06T14:34:57.558Z] 14:34:57     INFO - RuntimeError: Event loop is closed
[task 2024-02-06T14:34:57.638Z] 14:34:57     INFO - Closing logging queue
[task 2024-02-06T14:34:57.638Z] 14:34:57     INFO - queue closed
[task 2024-02-06T14:34:57.750Z] 14:34:57     INFO - PID 17017 | 1707230097648	geckodriver	INFO	Listening on 127.0.0.1:37177
[task 2024-02-06T14:34:57.750Z] 14:34:57     INFO - Starting runner
[task 2024-02-06T14:34:58.012Z] 14:34:58     INFO - TEST-START | /webdriver/tests/bidi/network/continue_response/request.py

Running all the tests in this test module just took all the time, and no delay is noticeable for startup. Most likely the additionally added tests from bug 1874206 caused this problem. Also [Chrome times out on these](https://wpt.fyi/results/webdriver/tests/bidi/network/continue_response/invalid.py?label=master&label=experimental&product=firefox&product=firefox_android&product=chrome&product=chrome_android&product=safari&aligned&view=subtest.

Julian, I assume that we have to extend the timeout for this particular test. For that lets file a new bug that we can also mark as regression if that's the case. Thanks.

Flags: needinfo?(jdescottes)
Depends on: 1879006

Yes that sounds correct

Flags: needinfo?(jdescottes)

With bug 1879006 fixed this should no longer be an issue.

Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → INCOMPLETE
Summary: Intermittent TEST-UNEXPECTED-TIMEOUT /webdriver/tests/bidi/network/continue_response/invalid.py | single tracking bug → Intermittent /webdriver/tests/bidi/network/continue_response/invalid.py | single tracking bug
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

(In reply to Treeherder Bug Filer from comment #4)

New failure instance: https://treeherder.mozilla.org/logviewer?job_id=446424012&repo=autoland

That's a different issue. Here the test did even not start. Maybe some hang in wpt-runner.

I think the new failure is similar to Bug 1875679

Oh, actually this is Android and we should check the adb logcat! So here is the real problem:

02-07 23:00:00.186 20174 20189 I Gecko   : 1707346800186	RemoteAgent	DEBUG	WebDriverBiDiConnection 75b7d506-8d69-4e5d-b1dc-8ba92ab04479 <- {"type":"success","id":17,"result":{"realm":"774c871c-b71f-4ad1-8017-2d6da254b507","type":"exception","exceptionDetails":{"columnNumber":22,"exception":{"type":"error"},"lineNumber":8,"stackTrace":{"callFrames":[{"columnNumber":22,"functionName":"","lineNumber":8,"url":"https://web-platform.test:8443/webdriver/tests/bidi/network/support/empty.html"}]},"text":"AbortError: The operation was aborted. "}}}
[..]
02-07 23:00:02.639 20342 20342 E hostapd : Could not read interface wlan1 flags: No such device

Here we most likely have a network error which first caused an AbortError for the fetch fixture and then shows a lot of wlan issues until Firefox gets killed. This would explain why we have missing entries in the normal error log file. Overall I expect this to happen kinda rarely.

As it looks like no further failures happened since the timeout has been extended.

Status: REOPENED → RESOLVED
Closed: 9 months ago9 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

(In reply to Treeherder Bug Filer from comment #10)

New failure instance: https://treeherder.mozilla.org/logviewer?job_id=446328882&repo=autoland

That's a failure from an old job on Tuesday last week.

Status: REOPENED → RESOLVED
Closed: 9 months ago9 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

Lets try closing again - so that the bot is not re-opening the bug.

Status: REOPENED → RESOLVED
Closed: 9 months ago9 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

(In reply to Treeherder Bug Filer from comment #14)

New failure instance: https://treeherder.mozilla.org/logviewer?job_id=456446413&repo=mozilla-beta

This is actually bug 1774392. I've re-classified.

Status: REOPENED → RESOLVED
Closed: 9 months ago7 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

(In reply to Treeherder Bug Filer from comment #16)

New failure instance: https://treeherder.mozilla.org/logviewer?job_id=456401480&repo=try

The stack looks more like bug 1894213. I'll re-classify if that turns out to be true.

There are quite a lot of issues for beta. Julian, do you know what might have caused that?

Flags: needinfo?(jdescottes)

This started spiking on the 20th, and early beta was removed on the 19th, so I'm guessing it has something to do with something flagged behind EARLY_BETA_OR_EARLIER, but what exactly I don't know.

Those tests are creating a lot of blocked requests, and it seems the root cause of the issue is

Parent 2793, Main Thread] ###!!! ABORT: failed to re-open freezeable shm: Too many open files: file /builds/worker/checkouts/gecko/ipc/chromium/src/base/shared_memory_posix.cc:345

I suspect the two are related. Maybe we could try to either: resume the blocked request after the invalid check, OR use dedicated tabs (I assume closing the tab will force to terminate the request?)

Flags: needinfo?(jdescottes)

Using dedicated tabs for each test sound like a good idea. Given that these tests are for invalid cases it sounds strange if we would do more than just the check.

Depends on: 1910555

Should now be fixed by the landing of bug 1910555. Marking as incomplete given that it is a single tracking bug.

Status: REOPENED → RESOLVED
Closed: 7 months ago3 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

(In reply to Treeherder Bug Filer from comment #28)

New failure instance: https://treeherder.mozilla.org/logviewer?job_id=468359397&repo=mozilla-beta

This failure is from July 29th, so before we landed the fix. However considering that this failure is on beta, and the fix only landed in nightly, I think we can wait until 130 becomes beta to close it again.

Beta is now 130, this should no longer occur.

Status: REOPENED → RESOLVED
Closed: 3 months ago3 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
You need to log in before you can comment on or make changes to this bug.