Closed Bug 1727995 Opened 2 years ago Closed 1 year ago

Websocket cannot send data larger than 500 kilobytes

Categories

(Core :: Networking, defect, P2)

Firefox 91
defect

Tracking

()

VERIFIED FIXED
101 Branch
Webcompat Priority P1
Tracking Status
firefox-esr91 101+ verified
firefox99 + wontfix
firefox100 + verified
firefox101 + verified

People

(Reporter: sdaniele3, Assigned: kershaw)

References

(Depends on 1 open bug, Blocks 2 open bugs, Regression)

Details

(Keywords: regression, Whiteboard: [necko-triaged])

Attachments

(6 files)

Attached file page.zip

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0

Steps to reproduce:

See attached zip file for small sample.

The "websocket.js" file will need to be modified. A server that accepts a websocket request should be configured in the "SERVER_URI" variable. Providing a working websocket server is a little complex and is out of scope for this request. It can be done with various python bits.

Once "websocket.js" has been updated you can serve the folder via python -m http.server and navigate to localhost.

Click on either of the buttons on the web page in order to create a websocket connection, load some binary data (via some random files I found on my computer), and send it over the socket.

Actual results:

When trying to send a Blob that is over 500 kilobytes Firefox hangs and never sends the data. If the Blob is smaller the data is sent right away.

Expected results:

Data should be sent. WebSocket supports like gigabytes, right?


I tried using mozregression to figure out if this broke at some point, but I ended up with a reggression back in like 2018.. Which is weird and I don't know if it's helpful.

Last Known Good: 2018-11-06
First Known Bad: 2018-11-07
Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6e842238034cd847ede178b4e65ea07704e4ffe6&tochange=5836a60614764631436bf5030c5baa34c676c7a2

mozregression couldn't find builds to get a smaller range. As this was almost three years ago I feel like it may be irrelevant, and I find it surprising nobody has noticed a 500k limit on WebSocket over this time.

The Bugbug bot thinks this bug should belong to the 'Core::Networking' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → Networking
Product: Firefox → Core

I am not able to reproduce the issue with Windows 10 + Firefox 91.0.2. This could be limited to Linux.

(In reply to sdaniele3 from comment #2)

I am not able to reproduce the issue with Windows 10 + Firefox 91.0.2. This could be limited to Linux.

I am also not able to reproduce with MacOS + Firefox nightly. Could you try to capture a log? Please add nsWebSocket:5 to MOZ_LOG env. Thanks.

Flags: needinfo?(sdaniele3)
Attached file 400k.log.moz_log
Flags: needinfo?(sdaniele3)
Attached file 500k.log.moz_log

Here are two captures. One attempting the 400k send (success) and another with the 500k send (failure).

The failed log shows that the websocket channel was closed before sending out the data, but I can't find out the reason from the log.

[Parent 155564: Socket Thread]: D/nsWebSocket WebSocketChannel::OnOutputStreamReady: Try to send 583189 of data
[Parent 155564: Socket Thread]: D/nsWebSocket WebSocketChannel::OnOutputStreamReady: write 0 rv 80470007
[Parent 155564: Socket Thread]: D/nsWebSocket WebSocketChannel::OnOutputStreamReady() 7fea14c0b000
[Parent 155564: Socket Thread]: D/nsWebSocket WebSocketChannel::OnOutputStreamReady: Try to send 583189 of data
[Parent 155564: Socket Thread]: D/nsWebSocket WebSocketChannel::OnOutputStreamReady: write 0 rv 80470007
[Parent 155564: Main Thread]: D/nsWebSocket WebSocketChannel::Close() 7fea17021000

Could you try to get the log again with setting MOZ_LOG to MOZ_LOG=timestamp,rotate:200,nsHttp:5,nsSocketTransport:5,nsWebSocket:5?
Thanks.

Flags: needinfo?(sdaniele3)
Attached file 400k.log.request-2.zip
Attached file 500k.log.request-2.zip
Flags: needinfo?(sdaniele3)

I have attached new logs. With the additional flags I get more files with output so they're attached as zips.

Procedure:
Open firefox in a fresh-ish profile (Was created a few days ago, pretty much for this ticket)
Navigate to localhost:8000, which serves my attached page.zip
Press the button
Wait a few seconds
Close the browser

Apparently, the code here is totally broken. For a temporary workaround, I'd suggest to disable websocket over http2 by turning off network.http.spdy.websockets.

Changing severity to S3, because websocket over http2 is not really common. Users can disable this as a workaround.

Severity: -- → S3
Priority: -- → P2
Whiteboard: [necko-triaged]

Just confirming that network.http.spdy.websockets=false bypasses the issue.

Assignee: nobody → kershaw

Since we are close to the end of this cycle, I'll land the patch at the next one.

Flags: needinfo?(kershaw)
Flags: needinfo?(dd.mozilla)

I am still working on a patch to disable some wpt tests.

Flags: needinfo?(kershaw)
Flags: needinfo?(dd.mozilla)

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:kershaw, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(kershaw)
Flags: needinfo?(dd.mozilla)

We'll reconsider the decision about disabling websocket over h2.

Flags: needinfo?(kershaw)
Flags: needinfo?(dd.mozilla)

Some steps to reproduce this bug locally:

  1. Adjust the value here to a much smaller value (e.x., 52).
  2. Run this web-platform test locally by running ./mach test testing/web-platform/tests/websockets/Send-65K-data.any.js --headless.
  3. Should be able to see the test stuck at TEST_START: /websockets/Send-65K-data.any.worker.html?wpt_flags=h2.

Hello!

I'm on 96.0.3 (64-bit) and experiencing the issue in our app. The app can't send data large/equal than 512Kb over a WebSocket (socket gets stuck). I can trigger the issue calling socket.send(new Uint8Array(512 * 1024)), it appears as being sent on the network tab, but any subsequent send(...) calls will not appear there.

Workaround with disabling network.http.spdy.websockets=false solves the issue for me.

  • Kershaw, could you give an update, what's the decision on disabling this flag by default? Should we expect it happening in the coming releases?
  • Is there anything I can do to help debugging the issue?

Thanks!

[Tracking Requested - why for this release]:
impacting a major product
(ask me if you want to know which one)

Does setting network.http.spdy.websockets or (network.http.http2.websockets) to false make the product work?
If yes, we might need to consider disabling websocket over http/2.

Flags: needinfo?(sledru)

comment 20 seems to say exactly that: "Workaround with disabling network.http.spdy.websockets=false solves the issue for me."

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #23)

comment 20 seems to say exactly that: "Workaround with disabling network.http.spdy.websockets=false solves the issue for me."

Right, but I am not sure comment 21 and 20 are about the same thing.

I added you to the thread

Flags: needinfo?(sledru)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: regression
Regressed by: 1434137

Dragana, given the webcompat aspect, should we move this to S2?

Webcompat Priority: --- → ?
Flags: needinfo?(dd.mozilla)

Set release status flags based on info from the regressing bug 1434137

Changing severity to S2 due to the new web-compat aspect of the issue.

Severity: S3 → S2
Flags: needinfo?(dd.mozilla)

Based on Comment 21 (and the according internal mailing thread), setting this as a WebCompat P1 for now.

Webcompat Priority: ? → P1
Pushed by kjang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/75e58c0ca309
Disable websocket over h2, r=necko-reviewers,dragana

Backed out for causing http2-websocket.sub.h2.* failures:
https://hg.mozilla.org/integration/autoland/rev/2b820f90b73dabdb37049cea0e4b9b2d94aa49d9

Push which ran failed tasks: https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry&revision=567e7a511a30141b4a634ddb5d90f510acfc695e&selectedTaskRun=fOhQWlzdTyClyaKNXNnwmQ.0
Failure log: https://treeherder.mozilla.org/logviewer?job_id=375807591&repo=autoland

TEST-UNEXPECTED-FAIL | /infrastructure/server/http2-websocket.sub.h2.any.worker.html | WSS over h2 - assert_true: expected true got false
TEST-UNEXPECTED-FAIL | /infrastructure/server/http2-websocket.sub.h2.any.html | WSS over h2 - assert_true: expected true got false

Flags: needinfo?(kershaw)

There is also this failure
INFO - TEST-UNEXPECTED-FAIL | /websockets/extended-payload-length.html?wpt_flags=h2 | Application data is 125 byte which means any 'Extended payload length' field isn't used at all. - assert_unreached: close event should not fire Reached unreachable code
Failure log: https://treeherder.mozilla.org/logviewer?job_id=375816829&repo=autoland&lineNumber=19672

Depends on: 1766618
Pushed by kjang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/eaed92709627
Disable websocket over h2, r=necko-reviewers,dragana
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 101 Branch
Regressions: 1766683

Note that the patch landed in this bug is just a temporary workaround. In that patch, we disabled websocket over http2 due to this bug.
With this workaround, websocket connections can fallback to use http/1.1 and users should not notice this. This gives us some time to fix this bug properly. Once we have a real fix for this bug, we'll re-enable websocket over http2.

Blocks: h2-proxy
Flags: qe-verify+

Are we planning to uplift this to mozilla-release for 100.0.1 also?

Flags: needinfo?(kershaw)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #37)

Are we planning to uplift this to mozilla-release for 100.0.1 also?

We also plan to disable this via Normandy, so I think we don't have to uplift this.
Thanks.

Flags: needinfo?(kershaw)

(In reply to Kershaw Chang [:kershaw] from comment #38)

We also plan to disable this via Normandy, so I think we don't have to uplift this.

What happens if we ship a 100.0.1 release without this change? AFAICT, the Normandy recipe doesn't apply there. Also, waiting for a Normandy rollout isn't as instantaneous as having the pref change set by default from the start.

Flags: needinfo?(kershaw)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #39)

(In reply to Kershaw Chang [:kershaw] from comment #38)

We also plan to disable this via Normandy, so I think we don't have to uplift this.

What happens if we ship a 100.0.1 release without this change? AFAICT, the Normandy recipe doesn't apply there. Also, waiting for a Normandy rollout isn't as instantaneous as having the pref change set by default from the start.

This is a good point. Using Normandy here is just for safe - we do 25% rollout first and ramp up to 100% a week later. However, this change has been already on nightly and beta for a while and we didn't get any bug report, so this change should be safe.

I'll request a release uplift. Thanks.

Flags: needinfo?(kershaw)

Comment on attachment 9242416 [details]
Bug 1727995 - Disable websocket over h2, r=#necko

Beta/Release Uplift Approval Request

  • User impact if declined: Websocket over h2 is not fully functional.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This is already verified on nightly and beta.
  • String changes made/needed: N/A
  • Is Android affected?: Yes
Attachment #9242416 - Flags: approval-mozilla-release?

Comment on attachment 9242416 [details]
Bug 1727995 - Disable websocket over h2, r=#necko

Approved for 100.0.1

Attachment #9242416 - Flags: approval-mozilla-release? → approval-mozilla-release+
QA Whiteboard: [qa-triaged]

I'm not sure how exactly should I verify the fix of this bug. I opened the file attached in a localhost on both Firefox 91 and the latest Firefox Nightly, but I couldn't see any differences between the 2 builds or I'm not looking where I should.

Here's a link comparing the 2 builds on Ubuntu 18.04 x64: https://imgur.com/a/8BV7aLm
I didn't notice any hang on both builds, but I'm not sure where to look if a blob is sent or not.
Could you please help me with some extra details?
Thanks.

Flags: needinfo?(kershaw)

The attached file will required some modifications to work as well as a webserver that serves websockets (not provided).

(In reply to Hani Yacoub from comment #44)

I'm not sure how exactly should I verify the fix of this bug. I opened the file attached in a localhost on both Firefox 91 and the latest Firefox Nightly, but I couldn't see any differences between the 2 builds or I'm not looking where I should.

Here's a link comparing the 2 builds on Ubuntu 18.04 x64: https://imgur.com/a/8BV7aLm
I didn't notice any hang on both builds, but I'm not sure where to look if a blob is sent or not.
Could you please help me with some extra details?
Thanks.

Sorry, I can't find a public websocket server for testing this easily.

Flags: needinfo?(kershaw)

Comment on attachment 9242416 [details]
Bug 1727995 - Disable websocket over h2, r=#necko

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Websocket over h2 is not fully functional.
  • User impact if declined: Firefox could not work well for some websites that support websocket over h2.
  • Fix Landed on Version: Nightly 101
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This is already verified on nightly and beta.
Attachment #9242416 - Flags: approval-mozilla-esr91?

sdaniele3, could you please check if the issue is fixed on your end on Firefox Nighty and Beta?
You can download the builds from here: https://www.mozilla.org/en-US/firefox/channel/desktop/
Thanks.

Flags: needinfo?(sdaniele3)

Comment on attachment 9242416 [details]
Bug 1727995 - Disable websocket over h2, r=#necko

Approved for 91.10esr.

Attachment #9242416 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+

sdaniele3, I'm coming back with the last request here, could you please check if the issue is fixed on Firefox 100.0.1 too so we could close this bug as verified?
You can download the build from here: https://archive.mozilla.org/pub/firefox/candidates/100.0.1-candidates/build1/
Thanks.

Flags: needinfo?(sdaniele3)

Thank you. I'll change the status of this bug to verified.

Status: RESOLVED → VERIFIED
QA Whiteboard: [qa-triaged]
Flags: qe-verify+
Regressions: 1770280

sdaniele3, actually could you please do this last verification on the latest esr build just to make sure that we don't have any issue there also?
https://archive.mozilla.org/pub/firefox/candidates/91.10.0esr-candidates/build1/

Thank you!

Flags: needinfo?(sdaniele3)
Flags: needinfo?(sdaniele3)

Thank you!
Updating the tracking flag accordingly.

(In reply to Anton Fisher from comment #20)

Hello!

I'm on 96.0.3 (64-bit) and experiencing the issue in our app. The app can't send data large/equal than 512Kb over a WebSocket (socket gets stuck). I can trigger the issue calling socket.send(new Uint8Array(512 * 1024)), it appears as being sent on the network tab, but any subsequent send(...) calls will not appear there.

Workaround with disabling network.http.spdy.websockets=false solves the issue for me.

  • Kershaw, could you give an update, what's the decision on disabling this flag by default? Should we expect it happening in the coming releases?
  • Is there anything I can do to help debugging the issue?

Thanks!

Hi Anton,

We've improved our code to support WebSocket over Http/2 recently.
Could you help us verify if this issue is fixed by enabling this pref network.http.http2.websockets at your side?

Thanks.

Flags: needinfo?(a.fschr)

The reporter already replied via email.

Flags: needinfo?(a.fschr)
You need to log in before you can comment on or make changes to this bug.