High frequency OSX websocket related failures
Categories
(Core :: Networking: WebSockets, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox106 | --- | unaffected |
firefox107 | --- | unaffected |
firefox108 | --- | wontfix |
People
(Reporter: imoraru, Unassigned)
References
Details
(Keywords: intermittent-failure, Whiteboard: [fixed by bug 1797816 comment 15][necko-triaged][necko-monitor])
We started to notice that there are frequent multiple failures related to websocket.
Here are some examples:
- Retriggers and backfills for mochitest plain
- Retriggers for bc
- Retriggers for mochites-chrome
- Retriggers and backfills for dt
- Retriggers for another type of websocket failure for mochitest plain
Added some jobs here, Oct 18 retriggers and backfills they seem to go deeper than this.
Reporter | ||
Comment 1•2 years ago
•
|
||
Few examples of failures:
- TEST-UNEXPECTED-FAIL | dom/security/test/csp/test_upgrade_insecure.html | upgrade all requests on toplevel https (result: websocket-unexpected-error) - didn't expect -1, but got it
- TEST-UNEXPECTED-FAIL | dom/security/test/sec-fetch/test_websocket.html | sanity: wssSocket onerror
- TEST-UNEXPECTED-FAIL | dom/security/test/https-only/browser_websocket_exceptions.js | sanity: websocket loaded - "onerror" == "onopen" - got "onerror", expected "onopen" (operator ==)
- TEST-UNEXPECTED-FAIL | toolkit/components/antitracking/test/browser/browser_staticPartition_websocket.js | Websocket endpoint accepts connections. -
- TEST-UNEXPECTED-FAIL | dom/websocket/tests/test_event_listener_leaks.html | websocket error
And I think this might also be related to the issue: TEST-UNEXPECTED-FAIL | toolkit/content/tests/chrome/test_about_networking.html | Test timed out. - - TEST-UNEXPECTED-FAIL | devtools/client/netmonitor/test/browser_net-ws-filter-freetext.js | Test timed out -
- TEST-UNEXPECTED-FAIL | dom/security/test/mixedcontentblocker/test_bug803225.html | resource with wss protocol did not load
All the failures seem OSX related...changing the bug summary accordingly.
Updated•2 years ago
|
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Comment 2•2 years ago
|
||
Hi Frederik! Can you please take a look in to this?
Thank you!
![]() |
||
Updated•2 years ago
|
![]() |
||
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 5•2 years ago
•
|
||
I tried to debug this a bit by running websocket tests with log enabled in this try run.
The log shows that after the ssl tunnel established, the websocket server returns some weird data.
[task 2022-11-03T18:58:31.332Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp nsHttpConnection::OnSocketWritable 128fa0300 ReadSegments returned [rv=0 read=509 sock-cond=0 again=1]
[task 2022-11-03T18:58:31.332Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp writing transaction request stream
[task 2022-11-03T18:58:31.332Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp nsHttpTransaction::ReadSegments 13a681900
[task 2022-11-03T18:58:31.333Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp nsHttpConnection::OnSocketWritable 128fa0300 ReadSegments returned [rv=0 read=0 sock-cond=0 again=1]
[task 2022-11-03T18:58:31.333Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: E/nsHttp nsHttpTransaction::OnSocketStatus [this=13a681900 status=804b000a progress=0]
[task 2022-11-03T18:58:31.334Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp nsHttpConnection::ResumeRecv [this=128fa0300]
[task 2022-11-03T18:58:31.334Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Main Thread]: D/nsWebSocket WebSocketChannel::GetInterface() 128f5d500
[task 2022-11-03T18:58:31.335Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp nsHttpConnection::OnSocketReadable [this=128fa0300]
[task 2022-11-03T18:58:31.335Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp nsHttpTransaction::WriteSegments 13a681900
[task 2022-11-03T18:58:31.336Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp nsHttpConnectionMgr::ShouldThrottle trans=13a681900
[task 2022-11-03T18:58:31.336Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Main Thread]: D/nsWebSocket WebSocketChannelParent::GetInterface() 128fedd60
[task 2022-11-03T18:58:31.337Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp nsHttpConnection::OnSocketReadable 128fa0300 trans->ws rv=0 n=0 socketin=80470002
[task 2022-11-03T18:58:31.337Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp nsHttpConnection::CloseTransaction[this=128fa0300 trans=13a681900 reason=80470002]
[task 2022-11-03T18:58:31.338Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp closing associated mTransaction
[task 2022-11-03T18:58:31.338Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp nsHttpTransaction::Close [this=13a681900 reason=0]
[task 2022-11-03T18:58:31.339Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp nsHttpConnectionMgr::RemoveActiveTransaction t=13a681900 tabid=d(1) thr=0
[task 2022-11-03T18:58:31.339Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp Active transactions -[0,0,0,0]
[task 2022-11-03T18:58:31.340Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp nsHttpTransaction::ParseHead [count=1]
[task 2022-11-03T18:58:31.340Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp nsHttpResponseHead::ParseVersion [version=]
[task 2022-11-03T18:58:31.340Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp looks like a HTTP/0.9 response
[task 2022-11-03T18:58:31.341Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp response status line needs default reason phrase
[task 2022-11-03T18:58:31.341Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: E/nsHttp Have status line [version=9 status=200 statusText=OK]
[task 2022-11-03T18:58:31.342Z] 18:58:31 INFO - GECKO(31929) | [Parent 31929: Socket Thread]: V/nsHttp nsHttpTransaction::Close 13a681900 0 Byte 0.9 Response
The normal case should be that we receive 101 response from the server as below:
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: V/nsHttp nsHttpConnection::OnSocketWritable 141e4cd00 ReadSegments returned [rv=0 read=509 sock-cond=0 again=1]
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: V/nsHttp writing transaction request stream
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: V/nsHttp nsHttpTransaction::ReadSegments 1387abb00
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: V/nsHttp nsHttpConnection::OnSocketWritable 141e4cd00 ReadSegments returned [rv=0 read=0 sock-cond=0 again=1]
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: E/nsHttp nsHttpTransaction::OnSocketStatus [this=1387abb00 status=804b000a progress=0]
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: V/nsHttp nsHttpConnection::ResumeRecv [this=141e4cd00]
0:08.04 GECKO(40178) [Parent 40178: Main Thread]: D/nsWebSocket WebSocketChannel::GetInterface() 141d1ca00
0:08.04 GECKO(40178) [Parent 40178: Main Thread]: D/nsWebSocket WebSocketChannelParent::GetInterface() 138fe9d00
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: V/nsHttp nsHttpConnection::OnSocketReadable [this=141e4cd00]
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: V/nsHttp nsHttpTransaction::WriteSegments 1387abb00
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: V/nsHttp nsHttpConnectionMgr::ShouldThrottle trans=1387abb00
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: E/nsHttp nsHttpTransaction::OnSocketStatus [this=1387abb00 status=804b0006 progress=225]
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: V/nsHttp nsHttpTransaction::WritePipeSegment 1387abb00 written=175
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: E/nsHttp nsHttpTransaction::ProcessData [this=1387abb00 count=175]
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: V/nsHttp nsHttpTransaction::ParseHead [count=175]
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: E/nsHttp nsHttpTransaction::ParseLine [HTTP/1.1 101 Switching Protocols]
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: V/nsHttp nsHttpResponseHead::ParseVersion [version=HTTP/1.1 101 Switching Protocols]
0:08.04 GECKO(40178) [Parent 40178: Socket Thread]: E/nsHttp Have status line [version=11 status=101 statusText=Switching Protocols]
Note that in both cases, Firefox received 509 bytes, but in the failed case, the data was obviously wrong.
Still not sure which part is wrong.
Comment 6•2 years ago
|
||
Looking at Bug 1784824 - Intermittent toolkit/content/tests/chrome/test_about_networking.html | single tracking bug which seems to also be failing because of the websocket server,
I see this in the log:
[task 2022-07-20T08:57:35.143Z] 08:57:35 INFO - TEST-START | toolkit/content/tests/chrome/test_about_networking.html
[task 2022-07-20T08:57:36.215Z] 08:57:36 INFO - could not open connection to the real server
[task 2022-07-20T08:58:00.178Z] 08:58:00 INFO - GECKO(1983) | 2022-07-20 08:58:00.177 firefox[1983:28845] Persistent UI failed to open file file:///Users/cltbld/Library/Saved%20Application%20State/org.mozilla.nightly.savedState/window_1.data: No such file or directory (2)
[task 2022-07-20T09:02:36.197Z] 09:02:36 INFO - TEST-INFO | started process screencapture
[task 2022-07-20T09:02:36.326Z] 09:02:36 INFO - TEST-INFO | screencapture: exit 0
[task 2022-07-20T09:02:36.326Z] 09:02:36 INFO - TEST-UNEXPECTED-FAIL | toolkit/content/tests/chrome/test_about_networking.html | Test timed out. -
That string is coming from here
I assume that just the websocket server isn't up, and ssltunnel is returning an error?
I'll take a look if we can check exactly what the WSserver is doing.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 11•2 years ago
|
||
The issue has also been observed with test tasks which got added to pushes from October and failed for these. Either the machine environment or an artifact used for test execution changed.
![]() |
||
Comment 12•2 years ago
|
||
Only one failure in the last ~12 hours. Resolved by fixing the reboots in bug 1797816 comment 15.
Please close the bug if no further work is planned.
Comment 13•2 years ago
|
||
Thanks! Glad to see this resolved. Most of the bugs in the see also list also seem to show reduced failure rates.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 16•2 years ago
|
||
We are still seeing a lot of failures on OSX websocket tests. Could you take a look?
Should we reopen bug 1797816 or is this a different problem?
Comment 17•2 years ago
|
||
The latest intermittent failures robot posts are from 3 months ago. Is it still a problem?
The configuration issue has been resolved (unless there are workers that haven't been updated - which could happen).
Michelle, as an FYI. In Bug 1797816 we had a g-w config issue that caused the hosts to not exit and reboot after running one job (they're run forever). Not sure if r8 009 is still quarantined, but may want to note in inventory.
Comment 18•2 years ago
|
||
We're still seeing some websocket tests intermittently failing only on OSX:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1777102&startday=2023-02-13&endday=2023-02-19&tree=all
https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1801420&startday=2023-02-13&endday=2023-02-19&tree=all
No failures since the 16th as far as I can tell. Will ping back next week if we see any more happening.
Comment 19•2 years ago
|
||
-009 isn't quarantined anymore and looking at recent jobs it is working properly
Let's hope we get no more failures.
Updated•2 years ago
|
Comment 20•2 years ago
|
||
https://bugzilla.mozilla.org/show_bug.cgi?id=1777102#c30
https://bugzilla.mozilla.org/show_bug.cgi?id=1801420#c14
There are still some OSX failures.
Comment 21•2 years ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit BugBot documentation.
Description
•