Downloading a large file on mega.nz sometimes spawns a very large number of threads, causing Firefox to become unresponsive.
Categories
(Core :: Networking, defect, P2)
Tracking
()
People
(Reporter: valyel, Assigned: valentin)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: regression, Whiteboard: [necko-triaged])
Attachments
(7 files, 1 obsolete file)
2.74 KB,
image/png
|
Details | |
184.98 KB,
image/png
|
Details | |
41.22 KB,
application/json
|
Details | |
82 bytes,
application/octet-stream
|
Details | |
54.05 KB,
image/png
|
Details | |
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-esr78+
|
Details | Review |
47 bytes,
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta-
RyanVM
:
approval-mozilla-esr78+
|
Details | Review |
User Agent: Mozilla/5.0 (Windows NT 6.3; Win64; x64; rv:81.0) Gecko/20100101 Firefox/81.0
Steps to reproduce:
Downloaded a large file (~2GB) from mega.nz. After mega.nz finishes buffering the data and starts saving it, Firefox begins spawning a large number of threads and becomes unresponsive. This happens somewhat frequently (~10% of the time).
Actual results:
See attached file.
Firefox continues spawning threads and becomes unresponsive. Eventually the spawning stops, and after some time the thread count declines. Takes 2-5 minutes.
Expected results:
Store the file and finish.
resmon screenshot
Comment 2•4 years ago
|
||
Hi,
I cannot reproduce on windows 10 pro, firefox release 81.0 (64-bit), beta 82.0b4 , nightly 83.0a1 (2020-09-29) (64-bit). I downloaded a sample file from https://www.thinkbroadband.com/download, and saved 1gb (biggest file available, couldn't find any other sample website with 2gb files) , I then made a copy of the file and saved two of them in one zip archive to make it a 2gb size file. I created an account in https://mega.nz/, uploaded the 2gb file, and then downloaded it, but it finished successfully.
I am setting a component for this in order to get the dev team involved. Please attach your about:support information.
Best,
Clara
Comment 3•4 years ago
|
||
Also, please confirm if this happened both using "standard download" and "download as zip" options.
Best,
Clara
I believe this happens with a standard download. I download files hosted by someone else, which are usually zipped.
This is not consistent. I have not been able to recreate it since creating the bug report, though have had this happen ~5 times in the past.
Size of the file may not be relevant. I have only seen this occur with larger files though.
Just reproduced this again. File was 1.32GB. Thread count begins to increase once the "saving" is done (the progress bar for the download reaches 100%). This time went up to 14000 threads, before declining. Had a YouTube video playing in a different tab.
Comment 7•4 years ago
|
||
This sounds pretty bad...
Does this happen on any site other than mega? Mega does some... strange... things with blobs and download verification.
Can you try to use the devtools profiler or debugger, or using about:memory's "measure" button (which reports JS/DOM workers) to see if these are website worker threads or something? It's hard to know what to do to narrow this down without clear steps to reproduce and/or the identities of the threads...
Here is the profiler output. Using anything else is unfortunately difficult, as the scheduler is losing it's mind trying to manage quite a few thousands of threads.
I can't say with certainty whether this happens with other sites, but as far as I can remember, it does not.
Let me know if you want more information on this.
Comment 9•4 years ago
|
||
(In reply to valyel from comment #8)
Let me know if you want more information on this.
Unfortunately the devtools profiler info does not appear to be readable. This is a tooling issue on the Firefox side, for which I can only apologize.
I'm not convinced our standard profiling tools (and attempting to profile all those thousands of threads...) aren't going to just hang like the debugger did. So I've written up a custom bit of JS that should hopefully tell us a bit more.
-
open about:config or about:preferences (has to be a "privileged" about page for the below script to work), and open the devtools web console in them
-
reproduce the issue
-
run the following script in the web console. It'll create a JS object whose property names are thread names, and whose property values are the number of such threads that are running.
threadMap = await (async function() {
let { threads } = await ChromeUtils.requestProcInfo();
let m = {};
for (let thread of threads) {
let normalizedName = (thread.name || "").replace(/#\d+$/, "");
if (!m.hasOwnProperty(normalizedName)) { m[normalizedName] = 0; };
m[normalizedName]++
}
return m;
})()
- then run (separately, in the same console):
copy(JSON.stringify(threadMap, null, 2))
This should copy a JSON version of that object to the clipboard, which you can paste here.
You can test the script out without step 2 (ie reproducing the downloading issue) to get an idea of what kind of data that gathers when things are "normal".
Reporter | ||
Comment 10•4 years ago
|
||
It doesn't seem like any of the threads have names assigned. Every thread name is replaced with empty string, so you just get the total count of threads. Reproduced to be sure, and the json output is { "" : 18863 }
.
Let me know if I'm missing something.
Comment 11•4 years ago
•
|
||
(In reply to valyel from comment #10)
It doesn't seem like any of the threads have names assigned. Every thread name is replaced with empty string, so you just get the total count of threads. Reproduced to be sure, and the json output is
{ "" : 18863 }
.
I'm pretty confused by this... I obviously tested the script before sending it to you, though I originally tested on macOS.
If you look at the output of (await ChromeUtils.requestProcInfo()).threads
, do all of those threads have an empty string as their name?
Yoric, is that expected on release, or Windows 8, or something? (I just tested with mozregression --launch 81
on Windows 10, and I see thread names there...)
Versions of Windows prior to 10 do not support thread names (that's a limitation of the OS). So that's not surprising.
Comment 13•4 years ago
|
||
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #12)
Versions of Windows prior to 10 do not support thread names (that's a limitation of the OS). So that's not surprising.
So what can we do here to get any meaningful debug information if we cannot reproduce? Surely there must be some way to get this information when we're creating tens of thousands of threads? It looks like there's MOZ_LOG
information for threads running tasks (which I'm sure there'll be lots of either way, so it'll be too spammy to work out what's going on, and anyway they don't seem to log any thread info), but I don't see any for thread creation. :-\
Reporter: I don't suppose you have another machine on which you're seeing this? If not, can you reproduce the same thing on a nightly build from https://nightly.mozilla.org/ ? If so, I can try and get you a custom build with instrumentation to get the relevant data... but the first question is whether this reproduces on nightly (which will be using a different profile).
Reporter | ||
Comment 14•4 years ago
|
||
I don't have another machine where I see this. I can give this a try on Win10 and Linux machines, though I assume it will be fine.
Reproduced on a clean nightly build. Same exact behavior. An instrumented build sounds good. I haven't seen the thread count climb past 50k, if that makes your job any easier.
(In reply to :Gijs (he/him) from comment #13)
Surely there must be some way to get this information when we're creating tens of thousands of threads?
As far as I can tell looking through the code, there doesn't even seem to be any MOZ_LOG
with the thread name. If you had access to the machine, you could breakpoint on thread creation, but I realize that it's not possible here.
Comment 16•4 years ago
•
|
||
Apologies for the delay here.
I just pushed to our tryserver infrastructure to get you a build. The patch I pushed is straightforward ( https://hg.mozilla.org/try/rev/36a4981b32fc8b68eae983777500ce4f07116a67 ), just in terms of you trusting the resulting build :-)
In about an hour, at https://treeherder.mozilla.org/#/jobs?repo=try&revision=82d88a3506db62ee7b03ac96c8d51cfaae911b93&selectedTaskRun=BucY1GpcQYq-AYHrTXxXIQ.0 , the "B" job that's selected and (as of writing) in progress, should turn green and be finished. Once that happens, at the bottom, in the "Artifacts" tab, there will be a "target.zip" link. This is a zipfile of the build, which you should be able to extract wherever you like and run from there. I'd advise not overwriting your existing Firefox installs with it to avoid issues. Extracting it elsewhere will also mean it'll use its own profile (though the usual rules apply around running multiple builds at once - you probably want to close any remaining Firefox / Nightly instances before opening this new one).
~Edit: it was finished more quickly than I thought; direct link to the zipfile: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/BucY1GpcQYq-AYHrTXxXIQ/runs/0/artifacts/public/build/target.zip
This build will dump thread creation into the browser (note: not the same as the regular developer tools) console (open with ctrl-shift-j). hopefully this should get us some more info... You can clear the console before reproducing, I guess. It should also be fine to not have the console open before the problem happens - opening it afterwards should still show you the most recent few hundred logs...
Reporter | ||
Comment 17•4 years ago
|
||
Created thread StreamTrans #56950
Created thread StreamTrans #56951
Created thread StreamTrans #56952
Created thread StreamTrans #56953
Created thread StreamTrans #56954
Created thread StreamTrans #56955
Created thread StreamTrans #56956
Created thread StreamTrans #56957
Created thread StreamTrans #56958
Created thread StreamTrans #56959
Created thread StreamTrans #56960
Created thread StreamTrans #56961
Created thread StreamTrans #56962
Created thread StreamTrans #56963
Created thread StreamTrans #56964
Created thread StreamTrans #56965
Created thread StreamTrans #56966
Created thread StreamTrans #56967
Created thread StreamTrans #56968
Created thread StreamTrans #56969
Created thread StreamTrans #56970
Created thread StreamTrans #56971
Created thread StreamTrans #56972
Created thread StreamTrans #56973
Created thread StreamTrans #56974
Created thread StreamTrans #56975
Created thread StreamTrans #56976
Created thread StreamTrans #56977
Created thread StreamTrans #56978
Created thread StreamTrans #56979
Created thread StreamTrans #56980
Created thread StreamTrans #56981
Created thread StreamTrans #56982
Created thread StreamTrans #56983
Created thread StreamTrans #56984
Created thread StreamTrans #56985
Created thread StreamTrans #56986
Created thread StreamTrans #56987
Created thread StreamTrans #56988
Created thread StreamTrans #56989
Created thread StreamTrans #56990
Created thread StreamTrans #56991
Created thread StreamTrans #56992
Created thread StreamTrans #56993
Created thread StreamTrans #56994
Created thread StreamTrans #56995
Created thread DNS Resolver #9
Created thread DNS Resolver #10
Created thread BgIOThreadPool #4
Created thread StreamTrans #56996
Created thread StreamTrans #56997
Created thread IndexedDB #32
Created thread IndexedDB #33
Created thread DNS Resolver #11
Created thread SSL Cert #21
Created thread StreamTrans #14
Created thread StreamTrans #15
Created thread StreamTrans #56998
Created thread Untrusted Modules
Created thread Jump List
Created thread IndexedDB #34
Created thread StreamTrans #56999
Created thread StreamTrans #57000
Created thread StreamTrans #57001
Created thread StreamTrans #57002
Created thread SSL Cert #22
Created thread StreamTrans #57003
Created thread StreamTrans #57004
Created thread StreamTrans #57005
Created thread StreamTrans #57006
Created thread BackgroundThreadPool #5
Created thread BackgroundThreadPool #6
Created thread Jump List
Created thread StreamTrans #57007
Created thread StreamTrans #57008
Created thread StreamTrans #57009
Created thread StreamTrans #57010
Created thread IndexedDB #35
Created thread StreamTrans #57011
Created thread SSL Cert #23
Created thread Jump List
Created thread Classifier Update
This is what I get. This is the end of the output. I removed 10k lines before this spawning more StreamTrans threads. However, I am not able to catch the beginning of this, as the console buffer is around 50k lines too short. :)
Comment 18•4 years ago
|
||
Thanks very much! That at least points at something.
Valentin, any idea how we're ending up creating tens of thousands of stream transport threads, or how we'd figure out why? https://searchfox.org/mozilla-central/rev/c2e3ac11be4837718c2604e58085fbc8252b69dd/netwerk/base/nsStreamTransportService.cpp#251,254-256 would suggest the limit is supposed to be 25...
Florian, I don't suppose we have thread creation markers in the gecko profiler, which would get us stacks (unsure if that's going to be helpful here)?
Comment 19•4 years ago
|
||
(In reply to :Gijs (back Oct. 12; he/him) from comment #18)
Florian, I don't suppose we have thread creation markers in the gecko profiler, which would get us stacks (unsure if that's going to be helpful here)?
Gerald, any idea of how difficult that would be? Would this be another marker that would need to live in the base profiler? I guess the marker would be in the track of the thread that requested the creation of another thread. I wonder if we should also collect a stack of when a thread is registered with the profiler; maybe that stack could be captured in the gecko profiler? Not sure where we would display it; maybe as a marker at the beginning of the new thread.
Quick thoughts: (Evening here, happy to help tomorrow if you have more questions.)
As a first quick thing, you could add a marker when creating a thread in xpcom, possibly in nsThread::Init.
These markers would appear in the thread-creating thread (i.e., not the new one), but at least they will capture the cause of the thread creation.
If you then run the profiler with "Bypass selections above and record all registered threads" (in about:profiling), hopefully you can find the markers you're interested in.
Maybe you could record that marker into new thread, possibly at the end of nsThreadManager::NewNamedThread (not sure how to get that from nsIThread
), by using something like PROFILER_MARKER_UNTYPED("Thread creation", OTHER, {MarkerThreadId( <get tid from nsIThread> ), MarkerStack::Capture()})
.
If needed, the next step would be to profiler_capture_backtrace()
when creating a thread, and pass it to the new thread through ThreadInitData
and record a marker (with MarkerStack::TakeBacktrace(std::move( <captured stack> ))
) in nsThread::ThreadFunc
. This should appear at the start of each thread.
No need to involve the Base Profiler, unless we wanted to instrument some raw thread creations outside of xul, which I don't think is needed for "StreamTrans" in this bug.
Good luck!
Comment 21•4 years ago
|
||
(In reply to Gerald Squelart [:gerald] (he/him) from comment #20)
Thanks! I opened bug 1670786 to work on these suggestions.
Comment 22•4 years ago
|
||
Having realized from looking at bug 1670737 that blame for some of the FreeOMT
calls is from bug 1584898 which was also fixed to deal with issues with huge downloads on mega.nz , I'm pretty sure this is due to that change. The thread count should not balloon like that though. I'd ask either :baku or :ehsan to take a look but neither is active in Firefox at the moment, AIUI. So I am not sure what best to do. I think batching free
calls a bit more (like doing 16 or 32 or whatever is a sensible number, in each task) would help without reintroducing the same problem that bug 1584898 fixed.
Kris, is there currently a "right" way to do this work? I see Bas landed a TaskController, but that seems oriented towards the main thread and so I'm not sure if it's appropriate here.
Updated•4 years ago
|
Assignee | ||
Comment 23•4 years ago
|
||
(In reply to :Gijs (he/him) from comment #18)
Valentin, any idea how we're ending up creating tens of thousands of stream transport threads, or how we'd figure out why?
It seems you've got the thread issue figured out. Great find. I wonder how often this caused a hard to diagnose perf problem.
Updated•4 years ago
|
Comment 24•4 years ago
|
||
(In reply to :Gijs (he/him) from comment #22)
Having realized from looking at bug 1670737 that blame for some of the
FreeOMT
calls is from bug 1584898 which was also fixed to deal with issues with huge downloads on mega.nz , I'm pretty sure this is due to that change. The thread count should not balloon like that though. I'd ask either :baku or :ehsan to take a look but neither is active in Firefox at the moment, AIUI. So I am not sure what best to do. I think batchingfree
calls a bit more (like doing 16 or 32 or whatever is a sensible number, in each task) would help without reintroducing the same problem that bug 1584898 fixed.Kris, is there currently a "right" way to do this work? I see Bas landed a TaskController, but that seems oriented towards the main thread and so I'm not sure if it's appropriate here.
I looked at these calls and tried to move them to the background thread pool - but since it's still an nsThreadPool
it still has the same strange behavior where it can make a whole bunch of these threads during large jobs. I can only think of two ways to really fix this until I can refactor nsThreadPool
dispatch - we can batch together the FreeOMT
calls like you say into larger tasks, so that they are less likely to create a new thread and we avoid bogging down the event queue, or we can use a background TaskQueue to force the tasks to only execute serially which would be roughly the equivalent of running all of the calls on a single standalone thread. I'm not sure what the perf risks could be of doing the latter over the former or how much batching the calls would improve the threading issue.
Assignee | ||
Comment 25•4 years ago
|
||
Also converts some unused code to a gtest.
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Comment 26•4 years ago
|
||
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/0bcd9a5ae49f Use a BackgroundTaskQueue to free up memory segments from nsSegmentedBuffer off main thread r=baku,sg
Comment 27•4 years ago
|
||
bugherder |
Updated•4 years ago
|
Comment 28•4 years ago
|
||
Please note that the patch got backed out by Valentin via bug 1675203:
https://hg.mozilla.org/integration/autoland/rev/e0fa99439c48
Updated•4 years ago
|
Assignee | ||
Comment 29•4 years ago
|
||
Depends on D93995
Comment 30•3 years ago
|
||
Pushed by valentin.gosu@gmail.com: https://hg.mozilla.org/integration/autoland/rev/e0c3ec47f2f4 Batch calls to nsSegmentedBuffer::FreeOMT when dispatching to a background thread r=baku,sg,KrisWright https://hg.mozilla.org/integration/autoland/rev/f9553be62ad1 Ensure there's only one nsSegmentedBuffer::FreeOMT pending at any time r=sg
Comment 31•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/e0c3ec47f2f4
https://hg.mozilla.org/mozilla-central/rev/f9553be62ad1
Comment 32•3 years ago
|
||
Since the status are different for nightly and release, what's the status for beta?
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 33•3 years ago
|
||
Comment on attachment 9188422 [details]
Bug 1667581 - Ensure there's only one nsSegmentedBuffer::FreeOMT pending at any time r=sg
Beta/Release Uplift Approval Request
- User impact if declined: In certain cases that use nsSegmentedBuffers we may be spawning and destroying a large number of threads causing the computer to lock up.
This isn't a very common scenario, but when it does happen the user experience sucks. - Is this code covered by automated tests?: Unknown
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: Yes
- If yes, steps to reproduce: see comment 0
- List of other uplifts needed: None
- Risk to taking this patch: Medium
- Why is the change risky/not risky? (and alternatives if risky): The previous attempt to fix this got backed out as it caused the memory to be released too late - sometimes causing a spike in memory use.
The current fix should not have that problem - it is essentially the same as the old code, still using the stream transport thread pool, but avoids dispatching more than one runnable so we don't hit bug 1670737.
We should make sure that this adequately fixes the problem without causing other perf regressions before uplifting to beta & esr. - String changes made/needed:
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: perf regression
- User impact if declined: same reasoning as for beta.
- Fix Landed on Version: 85
- Risk to taking this patch: Medium
- Why is the change risky/not risky? (and alternatives if risky): same as for beta
- String or UUID changes made by this patch:
Assignee | ||
Updated•3 years ago
|
Comment 34•3 years ago
|
||
I think this should ride the trains given the age of the bug and regression risk.
Updated•3 years ago
|
Updated•3 years ago
|
Comment 35•3 years ago
|
||
Tried to reproduce this issue on Windows 8.1 using multiple large files on Firefox 81.0 but was unsuccessful. I did receive an error from mega.nz when reaching 22% or 42% (always different) that says: Write error: [Exception... "Out of Memory" and suggests to download their app or use Chrome but this also happens with Latest Nightly (2020-12-08) build.
Valyel, could you please try with a build that contains the fix and see if you can still reproduce the initial issue?
https://www.mozilla.org/en-US/firefox/85.0a1/releasenotes/
Reporter | ||
Comment 36•3 years ago
|
||
I am now unable to reproduce this on build 85.0a1.
Comment 37•3 years ago
|
||
Thank you very much for your verification, closing this issue as Verified.
Comment 38•3 years ago
|
||
We're going to let this bake a cycle on Release before revisiting for ESR.
Comment 39•3 years ago
|
||
Comment on attachment 9182378 [details]
Bug 1667581 - Batch calls to nsSegmentedBuffer::FreeOMT when dispatching to a background thread r=baku,sg
Fixes a possible deadlock in some circumstances. Been on 85 since it was on Nightly with no known regressions. Approved for 78.8esr.
Updated•3 years ago
|
Comment 40•3 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-esr78/rev/e336753c2f16
https://hg.mozilla.org/releases/mozilla-esr78/rev/cc7a574fb4e0
Comment 41•3 years ago
|
||
I know its been a long time but could I ask you to do one more verification on 78.8.0esr Build?
You can download the .zip build from the link below, no need to install it.
https://archive.mozilla.org/pub/firefox/candidates/78.8.0esr-candidates/build2/win64/en-US/firefox-78.8.0esr.zip
Reporter | ||
Comment 42•3 years ago
|
||
Sorry to say, but the machine that I was having this issue on has been replaced. Unfortunately, I don't have a way to recover it at this point.
As such, I won't be able to verify. Sorry about that!
Comment 43•3 years ago
|
||
(In reply to valyel from comment #42)
Sorry to say, but the machine that I was having this issue on has been replaced. Unfortunately, I don't have a way to recover it at this point.
As such, I won't be able to verify. Sorry about that!
No worries, you already helped us plenty with previous verification.
Description
•