Version 55 breaks sending large amounts of data over web sockets.

RESOLVED FIXED in Firefox 59

Status

()

defect
P1
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: bill.Zickmantel, Assigned: michal)

Tracking

(4 keywords)

55 Branch
mozilla59
Points:
---
Bug Flags:
in-testsuite ?

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox55- wontfix, firefox56+ wontfix, firefox57 wontfix, firefox58 wontfix, firefox59 fixed)

Details

(Whiteboard: [necko-active])

Attachments

(3 attachments, 2 obsolete attachments)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36

Steps to reproduce:

We have an application that uploads files using binary web sockets. This was working fine in Version 54 and previous, but is broken on all platforms (32 & 64 bit versions) in Version 55. I tried 56 Beta, but it had the same problem.


Actual results:

Browser locked up completely. Had to be killed. I submitted at least one crash report.


Expected results:

File should have uploaded like it use to.
[Tracking Requested - why for this release]: A hang regression from Firefox 55, affecting both 32 and 64 bit as per the reporter.
Component: Untriaged → Networking: WebSockets
Product: Firefox → Core
If you have a chance, can you use the mozregression tool to determine the regression window? https://mozilla.github.io/mozregression/
Flags: needinfo?(bill.Zickmantel)
Keywords: site-compat
> I submitted at least one crash report.
Could you share the link ? 
you will have it in about:crashes

Thanks

Patrick, ni so that you are aware
Flags: needinfo?(mcmanus)
Tracking this for 56, leaving the ? flag for 55 for julien as 55 release owner.
https://crash-stats.mozilla.com/report/index/691eb04e-a9df-4112-9ce5-086a90170817

Hope this is what you are looking for.
Flags: needinfo?(bill.Zickmantel)
OK, this seems to be a duplicate of bug 1377277. I will keep both open for now until we know more.
Dragana, this might be interesting to you too.

Bill, would you have a test case to reproduce it?
Flags: needinfo?(dd.mozilla)
Not sure the crash above is related. It apparently involves Dropbox, like Bug 1330991.
(You can find DropboxExt.18.0.dll in the Modules tab)
Usually by this hangs it is interesting to look at the socketThread, (not always, but in this case yes).

I see webSocket, and workers on the socketThread. Michal, Ben can you please take a look?

Bill, if you could make a log that would be good, but I need an info from Michal and Ben which logging they would need.
Flags: needinfo?(michal.novotny)
Flags: needinfo?(dd.mozilla)
Flags: needinfo?(bkelly)
Flags: needinfo?(mcmanus)
I think the RefPtr<WorkerRunnable> is spurious here.  The crash reporter often reports incorrect template specialization types.  The actual type of the ref-counted member is nsCOMPtr<nsIOutputStreamCallback> here.

Is there some assumption that this callback will be invoked, though?  It seems we are overwriting a previous callback with a new one.  Maybe that original callback not getting invoked is creating the hang condition.
Flags: needinfo?(bkelly)
(In reply to Dragana Damjanovic [:dragana] from comment #9)
> Bill, if you could make a log that would be good, but I need an info from
> Michal and Ben which logging they would need.

nsWebSocket:5
Flags: needinfo?(michal.novotny)
Crash may not be related. It happened a while after the hang. The hang behavior is consistent in version 55 and 56 Beta. 54 works fine.

To reproduce you need to open a binary web socket to a server and send large amounts of data (i.e. a file). Dies after sending a packet of file data. Packet size is 1MB. We use binary web sockets for most of our client to server communication, but the payload is much smaller than 1MB. All of these other web socket data transfers seem to work fine.

I don't have a test case to give you, as this is functionality built into a large system. 

The following is essentially the core of what should simulate the problem:

// sockets are opened like this
var webSocket = new WebSocket(url);
webSocket.binaryType = "arraybuffer"; 

// there is other code between these two snippets, including an onopen handler etc.

// data is an ArrayBuffer of 1MB of file data
// header is an object
let headerAsString: string = JSON.stringify( header );
var binaryData = new Blob([headerAsString, data]);
	
if (webSocket.readyState == WebSocket.OPEN) {
    webSocket.send(binaryData);
}
Michal, can you look at this?
Flags: needinfo?(michal.novotny)
I cannot reproduce it. I modified test from bug 1332524 to send data from browser to WS server. I sent 1000 1MB packets without a problem.
Flags: needinfo?(michal.novotny)
Jason, can we find an owner for this so it doesn't drop off the radar? (I see Michal and Dragana are actively investigating...). Thanks!
Flags: needinfo?(jduell.mcbugs)
Whiteboard: [necko-backlog]
According to comment 14, seem more information is needed to move forward.

Bill, it helps a lot if you can get the log using about:networking.
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

Add nsWebSocket:5 into the log modules, so that would be:
timestamp,sync,nsHttp:5,nsSocketTransport:5,nsStreamPump:5,nsHostResolver:5,nsWebSocket:5
Flags: needinfo?(bill.Zickmantel)
Flags: needinfo?(bill.Zickmantel)
We have worked around this problem by reducing our data size to 512K + header from 1MB + header. Worked in 55.0.3.
Mark 56 won't fix as we are one week from RC. Let's see if we can fix it in 57.
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
Michal--take a look at the log in comment 18 and see if there's anything useful here.
Flags: needinfo?(jduell.mcbugs) → needinfo?(michal.novotny)
Priority: P3 → P2
Whiteboard: [necko-backlog] → [necko-next]
The amount we want to send is obviously wrong :-/

2017-09-05 17:24:48.209000 UTC - [Main Thread]: D/nsWebSocket WebSocketChannel::SendBinaryStream() 000001C76F91D000
2017-09-05 17:24:48.209000 UTC - [Socket Thread]: D/nsWebSocket WebSocketChannel::OnOutputStreamReady: Try to send 3857049061 of hdr/copybreak
Assignee: nobody → michal.novotny
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Flags: needinfo?(michal.novotny)
Priority: P2 → P1
Whiteboard: [necko-next] → [necko-active]
See Also: → 1415081
Marking this wontfix for Firefox 57.
Posted patch fix (obsolete) — Splinter Review
To reproduce the problem a blob must be sent when e10s is enabled (see bug 1415081), that's why I wasn't able to reproduce it. Bug 1415081 fixed the failure in ConvertStreamToString() but there is yet another bug that we don't delete current outgoing message when PrimeNewOutgoingMessage() fails and we try to send the invalid message in OnOutputStreamReady().
Attachment #8927444 - Flags: review?(mcmanus)
Comment on attachment 8927444 [details] [diff] [review]
fix

Review of attachment 8927444 [details] [diff] [review]:
-----------------------------------------------------------------

Is it worth using MakeScopeExit() to avoid these sorts of problems in the future?
Attachment #8927444 - Flags: review?(mcmanus) → review+
Comment on attachment 8927444 [details] [diff] [review]
fix

Review of attachment 8927444 [details] [diff] [review]:
-----------------------------------------------------------------

I was going to just nod at MakeScopeExit() and r+ this anyhow out of concern for backport.. but it looks like it should backport fine and Nathan's right that its an idiom that should be used more and applies perfectly to this case.. it even nicely self documents when the free no longer needs to be done which is all implicit and confusing right now.
Attachment #8927444 - Flags: review+
Posted patch patch v2 (obsolete) — Splinter Review
Attachment #8927444 - Attachment is obsolete: true
Attachment #8927947 - Flags: review?(mcmanus)
Comment on attachment 8927947 [details] [diff] [review]
patch v2

Review of attachment 8927947 [details] [diff] [review]:
-----------------------------------------------------------------

rather than making this implicit through rv, can you make it expicit the way onStartGuard in httpchannel does? I think that's the more understandable version..
Attachment #8927947 - Flags: review?(mcmanus)
Posted patch patch v3Splinter Review
Attachment #8927947 - Attachment is obsolete: true
Attachment #8930880 - Flags: review?(mcmanus)
Comment on attachment 8930880 [details] [diff] [review]
patch v3

Review of attachment 8930880 [details] [diff] [review]:
-----------------------------------------------------------------

I apologize for the delay here. I misread the patch and went down a bit of a rabbit hole - its fine afaict.
Attachment #8930880 - Flags: review?(mcmanus) → review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d0f561d1bf62
Delete current outgoing message when PrimeNewOutgoingMessage fails. r=mcmanus
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/d0f561d1bf62
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Is this something we can land an automated test for? Also, please request Beta approval on this when you're comfortable doing so.
Flags: needinfo?(michal.novotny)
Flags: in-testsuite?
Too late for Beta58. Mark 58 won't fix.
(In reply to Ryan VanderMeulen [:RyanVM] from comment #35)
> Is this something we can land an automated test for?

Probably not, after fixing bug 1415081, there is not a way how to make WebSocketChannel::PrimeNewOutgoingMessage() fail.
Flags: needinfo?(michal.novotny)
You need to log in before you can comment on or make changes to this bug.