[Linux] File downloading is CPU-intensive
Categories
(Toolkit :: Downloads API, defect)
Tracking
()
People
(Reporter: jonathan.poelen, Unassigned)
References
(Depends on 1 open bug)
Details
Attachments
(1 file)
1.82 MB,
video/webm
|
Details |
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:139.0) Gecko/20100101 Firefox/139.0
Steps to reproduce:
Download a file.
Actual results:
When downloading is fast, the CPU is heavily used, about:process / top command showing +150%.
After setting up a local server to manage the download speed, I find that about 1% of CPU is used per Mo/s.
Expected results:
The CPU shouldn't have to work so hard, and my fans shouldn't go crazy.
Comment 1•4 months ago
|
||
Hello! I have managed to reproduce the issue on my end using firefox 141.0a1(2025-06-19) on Ubuntu 22.04.
I would like to mention that on my end the CPU spike was about 90%.
I will mark it as NEW and set a component for it in order for our developers to get involved and provide a solution for it in further releases. If it's not the right component please feel free to change it to an appropriate one.
Have a nice day!
Comment 2•4 months ago
•
|
||
Is this happening on any high speed downloads, or only on specific types (like video).
Any sharable URL where you can easily reproduce this?
Could you please capture a performance profile during one of these high speed downloads? That may help with assigning the bug to the most appropriate component
See https://profiler.firefox.com/ for instructions.
Comment 3•4 months ago
•
|
||
(In reply to Marco Bonardo [:mak] from comment #2)
Is this happening on any high speed downloads, or only on specific types (like video).
Any sharable URL where you can easily reproduce this?Could you please capture a performance profile during one of these high speed downloads? That may help with assigning the bug to the most appropriate component
See https://profiler.firefox.com/ for instructions.
Same question for Sergiu. It would also be useful to understand if this is a regression, and what file/url you're using to test.
Comment 4•4 months ago
|
||
Hello, Gijs! I started a simple download from "https://www.thinkbroadband.com/download" and just looked at the top CPU usage from about:process.
I will provide a performance capture ASAP.
Comment 5•4 months ago
|
||
Here is a link to the performance profile:
Comment 6•4 months ago
•
|
||
(leaving needinfo for the reporter because it's always possible that they're seeing something different from Sergiu)
(In reply to Negritas Sergiu, Desktop QA from comment #5)
Here is a link to the performance profile:
Focusing in on the download time, https://share.firefox.dev/4kQiPDd , I don't really see anything in the frontend that is obviously responsible for the CPU use (vast majority of samples on the parent proc main thread are in poll()
on the event loop). In fact, the profile suggests that the CPU use is not that bad (certainly not consistently close to 100%), but about:processes
in the screenshots shows that the CPU use is that bad. I don't know why they're disagreeing, or how to find out where the "spare" bit of CPU is going - or if I'm missing some kind of smoking gun in the profile. Ni Florian for some pointers based on this profile...
The only real thing I see in the profile is vsync etc. for the compositor, so I assume this is the download animation, which AIUI we'd previously optimized? So I don't know if that's regressed on the layout/graphics side (AFAIK nothing significant has changed on the frontend side but I could be wrong about this). Sergiu: could you check if esr115 or older behaved any better) ?
Comment 7•4 months ago
|
||
It would be useful to look in about:processes for the names of the active threads. I tried to reproduce myself and the threads I saw as most active were 2 BackgroundThreadPool threads 20%, the main thread 15%, the Socket thread 8% and then the glean.dispatcher 5%, for a total CPU use of about 45%, on a very fast machine.
When profiling, it would be useful to include "Socket,Background" in the thread filter. Enabling the "Network Bandwidth" feature could also be nice.
Unfortunately the glean.dispatcher thread doesn't seem to be registered with the profiler, so I don't know what it was doing.
Here's a profile I captured myself with the profiler settings I recommended: https://share.firefox.dev/4nb7v5S
The BackgroundThreadPool threads seem to spend their CPU time computing SHA256 digests and writing, within nsAStreamCopier
runnables.
The socket thread is mostly calling recv, but also adding profiler markers, and calling into glean at https://searchfox.org/mozilla-central/rev/158fb9063c06a36e6a2efb2b1823bba8c01e82a1/netwerk/base/nsSocketTransportService2.cpp#1180-1182, which adds more profiler markers, and sends stuff to the glean dispatcher (so without having a profile of that thread, we can guess what makes it busy).
For my test I downloaded http://ipv4.download.thinkbroadband.com:81/5GB.zip (following the suggestion in comment 4), but if the original reporter was downloading from an https url, I would expect to see SSL overhead in the Socket thread.
Comment 8•4 months ago
|
||
Andrew, do you know why the BackgroundThreadPool threads spend time computing SHA256 digests? https://share.firefox.dev/4lha7gZ
Chris, anything we can do to reduce the overhead of these glean metrics? https://share.firefox.dev/4k01B4T
Note: both of these things add overhead, but neither is responsible for "high CPU usage" (which I couldn't reproduce on my machine).
Comment 9•4 months ago
|
||
Here is a profiler from 115.25.0ESR: https://share.firefox.dev/4l18nsg
I also managed to do a screen recording for 115.25.0ESR
Comment 10•4 months ago
•
|
||
(In reply to Florian Quèze [:florian] from comment #7)
Unfortunately the glean.dispatcher thread doesn't seem to be registered with the profiler, so I don't know what it was doing.
bug 1899618 might now have enough information on it to be actionable, pending bandwidth. Though bug 1899618 comment 5 suggests a problem with Fenix that we haven't looked into in a long while.
(In reply to Florian Quèze [:florian] from comment #8)
Chris, anything we can do to reduce the overhead of these glean metrics? https://share.firefox.dev/4k01B4T
Wow, that's a lot of 0ns
samples. Good, I guess, but a bit much to dispatch for every sample, I agree.
Yeah, actually, we have in Rust a buffered API for *_distribution
metrics (bug 1916673). We'd need to first implement the FOG Rust API for it (as an experimental API, we've only been adding support as it's been requested (memory_distribution
and custom_distribution
support was added in bug 1919245)). And then we'd need to design and implement the C++ API atop it so that nsSocketTransportService2.cpp
can use it. (Lucky us: all the markers in your profile are parent-process. If this can sometimes be child-process, we'll need to figure out IPC as well (bug 1920957))
Comment 11•4 months ago
|
||
(In reply to Florian Quèze [:florian] from comment #8)
Andrew, do you know why the BackgroundThreadPool threads spend time computing SHA256 digests? https://share.firefox.dev/4lha7gZ
That's a good question :)
This is the class:
https://searchfox.org/mozilla-central/rev/338a8ecb357df66ee19e875227d706fb0dde5f30/netwerk/base/BackgroundFileSaver.h#372-392
The SHA256 behaviour comes from Bug 829832, as part of Download Protection, bug 662819
I'm going to pass the ni' to Dimi, as they may have some understanding of this.
Reporter | ||
Comment 12•4 months ago
|
||
(In reply to Florian Quèze [:florian] from comment #7)
When profiling, it would be useful to include "Socket,Background" in the thread filter. Enabling the "Network Bandwidth" feature could also be nice.
I used a Network profile for the second one, but I didn't see any Socket,Background.
(In reply to Florian Quèze [:florian] from comment #7)
For my test I downloaded http://ipv4.download.thinkbroadband.com:81/5GB.zip (following the suggestion in comment 4), but if the original reporter was downloading from an https url, I would expect to see SSL overhead in the Socket thread.
I have the impression that it's even worse, but as there's no https version, I couldn't compare.
https://share.firefox.dev/3ZH7BZb (profiler by default)
https://share.firefox.dev/3ZG2N6c (with Network profile)
Comment 13•3 months ago
|
||
(In reply to jonathan.poelen from comment #12)
I have the impression that it's even worse, but as there's no https version, I couldn't compare.
https://share.firefox.dev/3ZH7BZb (profiler by default)
https://share.firefox.dev/3ZG2N6c (with Network profile)
These profiles look like they're from your own debug build of Firefox. Is that the case? Do you see the same slowdown in a normal, mozilla.org distributed optimized build?
I don't think we normally performance-optimize debug builds...
Reporter | ||
Comment 14•3 months ago
|
||
I use Firefox in Manjaro packages (archlinux). But otherwise I have the same problem with the version of mozilla.org (version 140): 50% CPU for a download speed of ~10MB/sec.
Comment 15•3 months ago
|
||
(In reply to jonathan.poelen from comment #14)
I use Firefox in Manjaro packages (archlinux). But otherwise I have the same problem with the version of mozilla.org (version 140): 50% CPU for a download speed of ~10MB/sec.
Could you share a profile from the mozilla.org build? comment 0 said 150% so 50% is an improvement in my book, assuming it's not a typo? :-)
Reporter | ||
Comment 16•3 months ago
|
||
(In reply to :Gijs (he/him) from comment #15)
(In reply to jonathan.poelen from comment #14)
I use Firefox in Manjaro packages (archlinux). But otherwise I have the same problem with the version of mozilla.org (version 140): 50% CPU for a download speed of ~10MB/sec.
Could you share a profile from the mozilla.org build? comment 0 said 150% so 50% is an improvement in my book, assuming it's not a typo? :-)
It's not an improvement, it's that the download speed is slower. On the previous profile it was 80MB/s (it's in the images) and I don't control the speed at http://ipv4.download.thinkbroadband.com:81/5GB.zip. In the screenshots of the previous profile, you can see that the CPU bar is full. How do you zoom in on the screenshots to see the number? In my memory it was close to 200%, but I can't say for sure.
To tell the truth, on a new Firefox profile without tabs or extensions, I don't exceed 120% with my test server (without profiler). Per download (because it's cumulative). I suppose that when I saw 150% on my normal profile, some tabs were activated at the same time, or some other factor. But for me it's all the same, 120% or +150%, both scenarios are abnormal.
v140 from mozilla.org: https://share.firefox.dev/3T6rcOJ
v139.0.4 from package manager: https://share.firefox.dev/43ZpWDc
Comment 17•3 months ago
|
||
(In reply to jonathan.poelen from comment #16)
(In reply to :Gijs (he/him) from comment #15)
(In reply to jonathan.poelen from comment #14)
I use Firefox in Manjaro packages (archlinux). But otherwise I have the same problem with the version of mozilla.org (version 140): 50% CPU for a download speed of ~10MB/sec.
Could you share a profile from the mozilla.org build? comment 0 said 150% so 50% is an improvement in my book, assuming it's not a typo? :-)
It's not an improvement, it's that the download speed is slower. On the previous profile it was 80MB/s (it's in the images) and I don't control the speed at http://ipv4.download.thinkbroadband.com:81/5GB.zip. In the screenshots of the previous profile, you can see that the CPU bar is full. How do you zoom in on the screenshots to see the number? In my memory it was close to 200%, but I can't say for sure.
Ah, thanks for clarifying and sorry to have misinterpreted.
To tell the truth, on a new Firefox profile without tabs or extensions, I don't exceed 120% with my test server (without profiler). Per download (because it's cumulative). I suppose that when I saw 150% on my normal profile, some tabs were activated at the same time, or some other factor. But for me it's all the same, 120% or +150%, both scenarios are abnormal.
Definitely agreed that this seems wrong.
v140 from mozilla.org: https://share.firefox.dev/3T6rcOJ
v139.0.4 from package manager: https://share.firefox.dev/43ZpWDc
Florian, I'm sorry to ask but could you take another look? I'm also confused because ISTM the causes you identified in comment 7 should be cross-platform, but I'm not able to reproduce this issue on Windows (I see about 3% CPU use for similar download speeds). Maybe my machine is "just" too fast to notice? Or is there some other factor here? Is there any possibility that some other aspect of the system (disk type, other software, kernel settings, whatever) is having an impact here? Although Sergiu confirmed the bug in comment #1, the profile in comment #5 looks fairly different from the ones from Jonathan, at least to my eyes.
Comment 18•3 months ago
|
||
(In reply to :Gijs (he/him) from comment #17)
v140 from mozilla.org: https://share.firefox.dev/3T6rcOJ
I'm not sure I'm reading the number correctly because the screenshot is small, but the CPU use shown in about:processes seems to be 32%. What I see in the CPU use information of the profile looks similar.
I'm not able to reproduce this issue on Windows (I see about 3% CPU use for similar download speeds). Maybe my machine is "just" too fast to notice?
It's possible the CPU speed plays a role (or the relative network speed compared to the CPU speed). The profiles in comment 16 are from an i7-7700 (that's an 8 years old CPU according to https://www.intel.fr/content/www/fr/fr/products/sku/97128/intel-core-i77700-processor-8m-cache-up-to-4-20-ghz/specifications.html).
Or is there some other factor here?
Another thing that might play a role is whether the downloads panel is open or not. I assume we use more CPU if we update the panel regularly than if we just update the toolbar icon with the progress.
Comment 19•3 months ago
|
||
(In reply to Andrew Creskey [:acreskey] from comment #11)
The SHA256 behaviour comes from Bug 829832, as part of Download Protection, bug 662819
I'm going to pass the ni' to Dimi, as they may have some understanding of this.
Really sorry for the late reply!
SHA256 is used by the download protection feature as part of the data sent to the download protection service to help determine whether a file is malicious. If SHA256 turns out to be a performance bottleneck, an alternative is using SHA1 or MD5 instead to reduce the computational overhead.
However, I'm not sure how that would affect classification accuracy or coverage, so we may need to check with Google on this.
Comment 20•3 months ago
|
||
Hello,
I have reproduced the issue, and I hope to successfully create my profiler data: https://share.firefox.dev/44MkuUx.
My CPU is an AMD Ryzen™ 7 8700F Processor.
My download speed was approximately 60 Mb/s.
My CPU usage (in about:processes) was around 105%.
I have the browser.safebrowsing.downloads.enabled key disabled.
A note about the profiler: it was quite resource-intensive to check the checkbox and wait for the compression to complete (approximately 3 GB of RAM).
Comment 21•3 months ago
|
||
The severity field is not set for this bug.
:mak, could you have a look please?
For more information, please visit BugBot documentation.
Comment 22•2 months ago
|
||
(In reply to Jeremy from comment #20)
Hello,
I have reproduced the issue, and I hope to successfully create my profiler data: https://share.firefox.dev/44MkuUx.
It doesn't look like this got symbolicated properly (so there's a lot of information missing and just labeled fun_<number>
), but also there doesn't seem to be a download happening? It looks like it was created when looking at a different profile. Perhaps worth retrying with Firefox rather than Librewolf.
I have the browser.safebrowsing.downloads.enabled key disabled.
It looks like the sha256 hashing is being done regardless of whether that key is enabled.
Dimi, this is from https://searchfox.org/mozilla-central/rev/6e2b186c296474e032d9ae2e000b7c870396775c/toolkit/components/downloads/DownloadCore.sys.mjs#2508 - should this be guarded by some preference check? What about the signature info? And if so, which prefs should be being checked?
Florian, we're noticing that everyone reproducing this is on Linux, and we cannot repro on Windows/macOS. But so far all the issues highlighted are OS-independent. Any ideas on how the OS would impact things here, or how we'd gather more useful data, as so far the profiles are not really turning up "finger pointing" level culprits (only things that could potentially be optimized in some situations, but they do not really explain the level of CPU use), as far as I can tell?
Comment 23•2 months ago
|
||
Dana, bit of a shot in the dark, but is there anything about the sha256 digest stream that is being used by backgroundfilesaver/nsAStreamCopier that is particularly inefficient? Of course we can maybe turn it off if people have safebrowsing disabled, but most people will have it enabled and I'm surprised it would cause significant CPU use when I'm used to hashing of large files being fairly fast.
Updated•2 months ago
|
Comment 24•2 months ago
|
||
(In reply to :Gijs (he/him) from comment #22)
It doesn't look like this got symbolicated properly (so there's a lot of information missing and just labeled
fun_<number>
), but also there doesn't seem to be a download happening? It looks like it was created when looking at a different profile. Perhaps worth retrying with Firefox rather than Librewolf.
https://share.firefox.dev/4ocScKJ Here using latest firefox nightly 143.0a1 (2025-08-01) (64-bit) - browser.safebrowsing.downloads.enabled is set to false
I hope it'll help you
Comment 25•2 months ago
|
||
(In reply to :Gijs (he/him) from comment #22)
(In reply to Jeremy from comment #20)
I have reproduced the issue, and I hope to successfully create my profiler data: https://share.firefox.dev/44MkuUx.
there doesn't seem to be a download happening?
There's a network request of 12.3GB that took 64s. The 'Bandwidth' track shows we were receiving more than 300MB/s.
Florian, we're noticing that everyone reproducing this is on Linux, and we cannot repro on Windows/macOS. But so far all the issues highlighted are OS-independent. Any ideas on how the OS would impact things here, or how we'd gather more useful data, as so far the profiles are not really turning up "finger pointing" level culprits (only things that could potentially be optimized in some situations, but they do not really explain the level of CPU use), as far as I can tell?
No good idea of what could be Linux specific here. Updating the progress UI caused visible CPU use spikes in my own profile from comment 7. If graphics hardware acceleration works well on Windows/Mac and not so well on Linux, that could make the progress UI be more expensive.
About how to get more information, I would look at about:processes with toolkit.aboutProcesses.showThreads set to true in about:config (it's the default on Nightly, but some of the profiles shared here were from release builds) to see which threads were using CPU time. Once we know the thread names we can add them in the profiler settings if they are registered with the profiler. If they are not, the next step is to use samply, but that requires doing it from a terminal.
Comment 26•2 months ago
|
||
(In reply to Florian Quèze [:florian] from comment #25)
About how to get more information,
One extra tip: most of the profiles shared here contain process CPU use information, but it isn't visible by default. To show it, you can open the devtools console on the profiler tab and run experimental.enableProcessCPUTracks()
.
Comment 27•2 months ago
|
||
(In reply to Jeremy from comment #24)
https://share.firefox.dev/4ocScKJ Here using latest firefox nightly 143.0a1 (2025-08-01) (64-bit)
Thanks!
I find PR_Poll
surprisingly expensive on the Socket thread in this profile. I also wonder why BackgroundFileSaverStreamListener::OnDataAvailable
has so much activity on the main thread.
Comment 28•2 months ago
|
||
(In reply to :Gijs (he/him) from comment #22)
Dimi, this is from https://searchfox.org/mozilla-central/rev/6e2b186c296474e032d9ae2e000b7c870396775c/toolkit/components/downloads/DownloadCore.sys.mjs#2508 - should this be guarded by some preference check? What about the signature info? And if so, which prefs should be being checked?
Looks like ContentAnalysis, which was added recently, also uses SHA-256, see
https://searchfox.org/mozilla-central/rev/81221d27112f12a67cb86287bf2b3cd9f19373af/toolkit/components/downloads/DownloadIntegration.sys.mjs#613.
As for signatureInfo, from a quick look, it seems like download protection is indeed the only consumer.
We can use browser.safebrowsing.downloads.enabled
to guard enableSha256 and enableSignatureInfo. We might also need a pref from ContentAnalysis as well.
![]() |
||
Comment 29•2 months ago
|
||
(In reply to :Gijs (he/him) from comment #23)
Dana, bit of a shot in the dark, but is there anything about the sha256 digest stream that is being used by backgroundfilesaver/nsAStreamCopier that is particularly inefficient? Of course we can maybe turn it off if people have safebrowsing disabled, but most people will have it enabled and I'm surprised it would cause significant CPU use when I'm used to hashing of large files being fairly fast.
The only thing I can think of is if somehow we're not using the hardware-accelerated sha256 implementation on linux for the architecture in use here. That said, I can't find any implementation of sha256 at all in those profiles (perhaps I'm just not looking in the right place?)
Comment 30•2 months ago
|
||
Looks like ContentAnalysis, which was added recently, also uses SHA-256
Isn't the preference browser.contentanalysis.enabled related to ContentAnalysis?
If it is, the value is set to false by default in all versions of firefox.
Comment 31•2 months ago
|
||
(In reply to Jeremy from comment #30)
Looks like ContentAnalysis, which was added recently, also uses SHA-256
Isn't the preference browser.contentanalysis.enabled related to ContentAnalysis?
If it is, the value is set to false by default in all versions of firefox.
Yes, I was answering the question regarding which preference we should use to guard backgroundFileSaver.enableSha256.
Even though it is currently disabled by default, we should still check the preference to decide whether enableSha256 should be applied.
Comment 32•2 months ago
|
||
(In reply to Dana Keeler (she/her) [:keeler] from comment #29)
(In reply to :Gijs (he/him) from comment #23)
Dana, bit of a shot in the dark, but is there anything about the sha256 digest stream that is being used by backgroundfilesaver/nsAStreamCopier that is particularly inefficient? Of course we can maybe turn it off if people have safebrowsing disabled, but most people will have it enabled and I'm surprised it would cause significant CPU use when I'm used to hashing of large files being fairly fast.
The only thing I can think of is if somehow we're not using the hardware-accelerated sha256 implementation on linux for the architecture in use here. That said, I can't find any implementation of sha256 at all in those profiles (perhaps I'm just not looking in the right place?)
Apologies for the slow follow-up here - https://share.firefox.dev/4fzrCHD (from Florian in comment #7) shows time spent in SHA256_Update_Native
. Is that the non-hwa path or the hwa path?
I don't see the same code being hit by any of the profiles provided by the reporter or Jeremy, though I can't tell if that's because those profiles didn't include the relevant thread or if I'm not looking in the right place. Florian, can you check?
(In reply to Florian Quèze [:florian] from comment #27)
(In reply to Jeremy from comment #24)
https://share.firefox.dev/4ocScKJ Here using latest firefox nightly 143.0a1 (2025-08-01) (64-bit)
Thanks!
I find
PR_Poll
surprisingly expensive on the Socket thread in this profile. I also wonder whyBackgroundFileSaverStreamListener::OnDataAvailable
has so much activity on the main thread.
As far as I can tell for downloads that go through nsExternalAppHandler
(which despite the name I think is going to be most/all of them), we create a stream listener and then we call that on the main thread when data is available, and that will do an async/OMT write, I think here. But I don't actually know this code very well at all and the people who wrote and reviewed it (paolo, biesi) aren't around. I don't know if rearchitecting this to go directly from background network threads to background file write threads (cutting out the mainthread middle man) would improve things as far as download speed or CPU usage is concerned.
Given the download speeds that we're talking about (100s of MB/s) I wonder if the behaviour would improve if we notified for new data being available less frequently (fewer notifications, more data written in one go) and/or used different buffer sizes. Andrew, apologies for the stupid question but is that possible at all? Anything else you're seeing in this story that would help explain the CPU use?
![]() |
||
Comment 33•2 months ago
|
||
(In reply to :Gijs (he/him) from comment #32)
https://share.firefox.dev/4fzrCHD (from Florian in comment #7) shows time spent in
SHA256_Update_Native
. Is that the non-hwa path or the hwa path?
Yes, that should be the hwa path.
Given the download speeds that we're talking about (100s of MB/s) I wonder if the behaviour would improve if we notified for new data being available less frequently (fewer notifications, more data written in one go) and/or used different buffer sizes.
Hashing in larger chunks would probably help - there's significant overhead in each update call, unfortunately.
Comment 34•2 months ago
|
||
strace-profiling the downloading of a 100MB file:
- wget: https://share.firefox.dev/4mncxvB there are
read
syscalls with a 65536 size; - curl: https://share.firefox.dev/4oBq4Rv there are
recvfrom
syscalls with a 102400 size; - firefox: https://share.firefox.dev/3HH6Vgr
recvfrom
syscalls with a 32768 size.
Comment 35•2 months ago
|
||
(In reply to :Gijs (he/him) from comment #32)
...
As far as I can tell for downloads that go through
nsExternalAppHandler
(which despite the name I think is going to be most/all of them), we create a stream listener and then we call that on the main thread when data is available, and that will do an async/OMT write, I think here. But I don't actually know this code very well at all and the people who wrote and reviewed it (paolo, biesi) aren't around. I don't know if rearchitecting this to go directly from background network threads to background file write threads (cutting out the mainthread middle man) would improve things as far as download speed or CPU usage is concerned.Given the download speeds that we're talking about (100s of MB/s) I wonder if the behaviour would improve if we notified for new data being available less frequently (fewer notifications, more data written in one go) and/or used different buffer sizes. Andrew, apologies for the stupid question but is that possible at all? Anything else you're seeing in this story that would help explain the CPU use?
Sorry for the delay.
I'm not familiar with this codepath but one idea would be to retarget nsExternalAppHandler::OnDataAvailable()
off the main thread, i.e. implement nsIThreadRetargetableStreamListener and so have the data delivered directly to the target thread.
I think this could reduce some of the copying.
But again, it's not clear to me what's causing the high CPU usage.
Description
•