Created attachment 8828608 [details] webworkers_lost_data.zip User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.79 Safari/537.36 Edge/14.14393 Steps to reproduce: Tested on 50.1.0 release and 53.0a1 nightly running on Windows 10. 1) Install node.js on the local machine 2) Extract the attached zip file and run "node wsserver.js" 3) Launch Firefox and open http://localhost:12345/wsclient.html 4) Observe the console log messages 5) Open http://localhost:12345/wsclient_worker.html 6) Observe console log messages Actual results: In http://localhost:12345/wsclient.html: Connected to localhost:12345 Received 10000 packets for a total of 10000000 bytes Connection closed In http://localhost:12345/wsclient_worker.html: Connected to localhost:12345 Byte 411000: Expected 120, got 24 Received 412 packets for a total of 411000 bytes Connection closed (packet offset of the data corruption changes randomly) Expected results: Both the wsclient.html (without webworkers) and wsclient_worker.html (with webworkers) should have received 10000 packets and 10000000 bytes. This appears to repro most often when a lot of data is being sent quickly. You can adjust the configuration in wsserver.js to see the affects of a larger packet delay, etc. With the current wsserver.js configuration, this repros every time in my testing.
hmm.. this is probably in the dom queueing somewhere, but michal do you want to see if anything looks amiss and route appropriately?
Assignee: nobody → michal.novotny
@Michal. Did you get a chance to triage this bug. This affects us in real traffic as well.
Flags: needinfo?(mcmanus) → needinfo?(michal.novotny)
I can't reproduce it. I tried also to increase packetLength to 10000, but it always succeeds on my computer.
@Michal, thanks for looking at this. If it helps at all I recently tried this on Ubuntu and was also unable to reproduce. We have only reproduced this with Firefox on Windows in our testing. Is that what you tried?
I tested it on Fedora. Unfortunately, I don't have Windows development machine.
@Michal, FYI I just tried the latest Firefox nightly build on Windows and this bug appears to be fixed. Just still hitting it on the latest release build though.
Quick update, it looks like it is also fixed in 54.0b13, so should ship next week. Thanks Michal and all.
@Malcolm were you able to find the patch that fixed this issue?
@Antony, no I took a bit of a look but haven't found any changes that are clearly related
This is still broken in the Firefox 54.0 release build running on Windows 10, unfortunately. The 55.0b1 beta is working for us though.
@Michal or anyone else familiar with the Firefox release process. Strangely enough, betas installed from here: https://download-installer.cdn.mozilla.net/pub/firefox/releases/54.0b1/win32/en-US/Firefox%20Setup%20Stub%2054.0b1.exe do *not* repro. Betas installed from here: https://ftp.mozilla.org/pub/firefox/releases/54.0b1/win32/en-US/Firefox%20Setup%2054.0b1.exe *do* repro. Release builds installed from either download-installer.cdn.mozilla.net or ftp.mozilla.org/pub/firefox/releases *do* repro. This is true going back to 50.0 or earlier. Would this give any hint about where the problem lies? Thanks!
Ah, my apologies for incorrect information in the last message, I see that Firefox Setup Stub *.exe is always updating to 55.0b1 automatically, so that was not comparing the same versions. Now, 54.0b13 repros this bug when installed from either https://download-installer.cdn.mozilla.net/pub/firefox/releases/54.0b13/win32/en-US/Firefox%20Setup%2054.0b13.exe (no "Stub") or https://ftp.mozilla.org/pub/firefox/releases/54.0b13/win32/en-US/Firefox%20Setup%2054.0b13.exe However, I had 54.0b13 installed over a week ago from Firefox%20Setup%20Stub%2054.0b13.exe when it was the newest version without having a repro of the bug. FYI, we are not getting a repro of the bug with 55.0b1, installed from any of the sources.
@Malcolm, is it possible to run the tool mozregression to narrow down which patch fixed this issue? You can see http://mozilla.github.io/mozregression/ for details. Thanks!
Hi Shian-Yow, Thanks for the link! I wasn't aware of that tool - that is really useful. I ran it twice to make sure, with the same result: 2017-06-15T10:59:46: INFO : Narrowed inbound regression window from [a8bb5c8e, bf0f8972] (3 revisions) to [5182b2c4, bf0f8972] (2 revisions) (~1 steps left) 2017-06-15T10:59:46: DEBUG : Starting merge handling... 2017-06-15T10:59:46: DEBUG : Using url: https://hg.mozilla.org/mozilla-central/json-pushes?changeset=bf0f897261395ed2488f8fef72cf5353da1e105e&full=1 2017-06-15T10:59:47: DEBUG : Found commit message: Bug 1350814 - Replace use of int64_t for microseconds by TimeUnit in AudioSink. r=kikuo MozReview-Commit-ID: 3diOpJu7g8i 2017-06-15T10:59:47: INFO : The bisection is done. 2017-06-15T10:59:47: INFO : Stopped From looking over that changeset ( https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a8bb5c8e0e80e87e21ad63997f8881991b5347cf&tochange=bf0f897261395ed2488f8fef72cf5353da1e105e ), it looks very possible it was fixed by this commit: Shih-Chiang Chien <email@example.com> Mon, 20 Mar 2017 17:29:07 +0800 (2 months ago) Bug 1320744 - Part 2, Allow ChannelEventQueue to perform flush on multiple threads. r=mayhemer MozReview-Commit-ID: Egu2mvwFTUF Thanks for your help, Malcolm
Thanks for this info, Malcolm! Normally the mozregression can keep bisecting to find the exact commit. Your bisection seems stopped at the mozilla-central tree of the merged commit, and didn't bisect further into mozilla-inbound tree. I tried on my Ubuntu with the following commands, by manually taking the last 2 revisions in your result, and it can keep bisecting deeper into mozilla-inbound tree. Maybe you can try if it works on Windows. $ mozregression --find-fix -b 5182b2c4 -g bf0f8972 ********** You should use a config file. Please use the --write-config command line flag to help you create one. ********** 0:00.39 WARNING: You are using mozregression version 2.3.12, however version 2.3.13 is available. 0:00.39 WARNING: You should consider upgrading via the 'pip install --upgrade mozregression' command. 0:00.39 INFO: 5182b2c4 is not a release, assuming it's a hash... 0:00.39 INFO: bf0f8972 is not a release, assuming it's a hash... 0:00.39 INFO: Getting mozilla-inbound builds between bf0f8972 and 5182b2c4 0:04.90 INFO: Testing good and bad builds to ensure that they are really good and bad... 0:04.90 INFO: Downloading build from: https://queue.taskcluster.net/v1/task/P-2HG-lyTASX_gUECkIOwQ/runs/0/artifacts/public%2Fbuild%2Ftarget.tar.bz2 ===== Downloaded 100% ===== 0:29.11 INFO: Running mozilla-inbound build built on 2017-03-28 11:36:30.966000, revision 490b80ca 0:41.16 INFO: Launching /tmp/tmp6s74iY/firefox/firefox 0:41.16 INFO: Application command: /tmp/tmp6s74iY/firefox/firefox -profile /tmp/tmpcgvrCm.mozrunner 0:41.17 INFO: application_buildid: 20170328110610 0:41.17 INFO: application_changeset: 490b80ca23d23f750fdf5517651d6c1759b8afbc 0:41.17 INFO: application_name: Firefox 0:41.17 INFO: application_repository: https://hg.mozilla.org/integration/mozilla-inbound 0:41.17 INFO: application_version: 55.0a1
SC, according to comment 14, this bug is possibly fixed by your patch in bug 1320744. Do you think it's relevant?
It is possible that my modification on ChannelEventQueue changes the behavior of WebSocket and accidentally fix this bug. However it'll be better to further bisect the inbound revision as a confirmation. Malcolm can you do one more run of verification following comment #15, since you have the test environment setup nicely? :)
Hi Shian-Yow and Shih-Chiang, Thanks for your help! I ran the tool with those options, and the bisection into mozilla-inbound resulted in: 2:59.83 INFO: Narrowed inbound regression window from [4fd1e9b7, 490b80ca] (3 revisions) to [13eb5319, 490b80ca] (2 revisions) (~1 steps left) 2:59.84 INFO: No more inbound revisions, bisection finished. 2:59.84 INFO: First good revision: 490b80ca23d23f750fdf5517651d6c1759b8afbc 2:59.84 INFO: Last bad revision: 13eb53195041d2adcd3f7ddf467480b1a94d1f53 2:59.84 INFO: Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=13eb53195041d2adcd3f7ddf467480b1a94d1f53&tochange=490b80ca23d23f750fdf5517651d6c1759b8afbc This appears to cover mostly the same range of commits. I'll attach the full mozregression output to the bug for your reference.
Thanks Malcolm, I've confirmed that Bug 1320744 part 2 fixed this issue. An assertion will be triggered before that patch, which looks like a ChannelEvent out-of-order issue. >Assertion failure: enqueue == true || mEventQueue.IsEmpty() (Should always enqueue if ChannelEventQueue not empty), at /mozilla-central/obj-firefox.noindex/dist/include/mozilla/net/ChannelEventQueue.h:129 >#01: mozilla::net::WebSocketChannelChild::RecvOnBinaryMessageAvailable(nsCString const&)[/mozilla-central/obj-firefox.noindex/dist/NightlyDebug.app/Contents/MacOS/XUL +0x59259a] >#02: mozilla::net::PWebSocketChild::OnMessageReceived(IPC::Message const&)[/mozilla-central/obj-firefox.noindex/dist/NightlyDebug.app/Contents/MacOS/XUL +0x8d5afb] >#03: mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&)[/mozilla-central/obj-firefox.noindex/dist/NightlyDebug.app/Contents/MacOS/XUL +0xa1fadf] >#04: mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&)[/mozilla-central/obj-firefox.noindex/dist/NightlyDebug.app/Contents/MacOS/XUL +0x67e8cb] >#05: mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&)[/mozilla-central/obj-firefox.noindex/dist/NightlyDebug.app/Contents/MacOS/XUL +0x67c7c4] >#06: mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&)[/mozilla-central/obj-firefox.noindex/dist/NightlyDebug.app/Contents/MacOS/XUL +0x67d924] >#07: mozilla::ipc::MessageChannel::MessageTask::Run()[/mozilla-central/obj-firefox.noindex/dist/NightlyDebug.app/Contents/MacOS/XUL +0x67e0ac] >#08: nsThread::ProcessNextEvent(bool, bool*)[/mozilla-central/obj-firefox.noindex/dist/NightlyDebug.app/Contents/MacOS/XUL +0x120a5b] >#09: NS_ProcessPendingEvents(nsIThread*, unsigned int)[/mozilla-central/obj-firefox.noindex/dist/NightlyDebug.app/Contents/MacOS/XUL +0x11df73] >#10: nsBaseAppShell::NativeEventCallback()[/mozilla-central/obj-firefox.noindex/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2a06591] >#11: nsAppShell::ProcessGeckoEvents(void*)[/mozilla-central/obj-firefox.noindex/dist/NightlyDebug.app/Contents/MacOS/XUL +0x2a6c89e]
Status: UNCONFIRMED → RESOLVED
Last Resolved: 8 months ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1320744
You need to log in before you can comment on or make changes to this bug.