Closed Bug 1382899 Opened 7 years ago Closed 7 years ago

Reduce Promise overhead in the DownloadLegacy.js progress events

Categories

(Firefox :: Downloads Panel, defect, P1)

56 Branch
defect

Tracking

()

RESOLVED FIXED
Firefox 57
Performance Impact low
Tracking Status
firefox57 --- fixed

People

(Reporter: st3fan, Assigned: Paolo)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

I think this is related to downloads but I am not 100% sure.

It seems every time I download a file from a specific site (hard to give access, but I can tell the download is exteemely slow like 140 KB/sec) Firefox seems to slow down up to a point where I get beachballing and my cursor stops moving smoothly.

Couple of observations:

- the download itself is pretty slow. I'm on 100 Mbit but this comes in at 140 KB/sec.
- the server is not providing a content-length or an incorrect content length .. the download manager does not show the actul size and it shows an indeterminate spinner
- The download is marked as 34 MB on the page but in reality is 77 MB
- while the download happens i can barely type in this text field in bugzilla

I'm a Mac Pro with 64 GB of which half free, running OS X 10.12.6 and Today's Nightly.
Crazy thought .. i had 7 windows open. I closed all but one and did the download again. Much smoother.

Is it possible that we broadcast status updates to 7 windows and they all update some state individually? Is this IPC overhead or so?
We broadcast the notification, but we create the user interface elements only if you've previously opened the Downloads Panel in the window. The issue might be more related to having many web pages open than to the Downloads code specifically, but the best way to be sure would be to get a performance profile.
I have a new STR, which is much simpler:

1) Go to https://www.mozilla.org/en-US/firefox/new/
2) Hit "Free Download"
3) A Save panel appears. Don't save the file, just let it sit there.

At this point Nightly uses close to 100% CPU. The cursor starts stuttering.
Stefan, do you think you can capture a profile?
Flags: needinfo?(sarentz)
Here is a profile of a download that is causing this:

https://perfht.ml/2ffcwNo

I did this:

1) Open https://developer.apple.com/download
1) Start recording
2) Tap the download link for macOS Server 5.4 beta 4 (192 MB)
3) let the download finish
4) stop recording

Let me know if you want better or more examples.
Flags: needinfo?(sarentz)
Here is a similar profile but then for a larger (1.3GB) file - maybe that gives a bit more insight since I am on such a fast connection here.

https://perfht.ml/2feRBdq
I still have extensions.allow-non-mpc-extensions enabled. Could that make a difference.
Unfortunately I don't know how to read this profile, it seems most of the time is spent on a synchronization primitive. Maybe this is normal, or maybe there is just a misuse of the primitive in the code.

Florian, have you come across anything like this while looking at performance profiles, or do you know who could help reading this?
I don't even have to start the download. It is enough to click on the file link and then let the Save dialog sit there. While it is just sitting there, doing nothing, CPU goes sky high and my mouse pointer barely moves.

Here is a profile of just doing that: https://perfht.ml/2ffQp9y

1) Start recording
2) Click the download link
3) Let the Save Or Open dialog sit there for a few seconds
4) Stop recording

Is there something else I can do?
(In reply to :Paolo Amadini from comment #9)
> Unfortunately I don't know how to read this profile, it seems most of the
> time is spent on a synchronization primitive. Maybe this is normal, or maybe
> there is just a misuse of the primitive in the code.
> 
> Florian, have you come across anything like this while looking at
> performance profiles, or do you know who could help reading this?

First analysis of the provided profiles when focusing on the JS code:

I don't see anything interesting in the JS code in the profile from comment 10.

The JS in the profile in comment 6 is dominated by us wasting at least 800ms of CPU by calling repeatedly this._deferDownload.promise.then at http://searchfox.org/mozilla-central/rev/b52285fffc13f36eca6b47de735d4e4403b3859e/toolkit/components/jsdownloads/src/DownloadLegacy.js#183

I think this .then call should happen only once and the result should be cached. Or the onProgressChange64 method should just return early if the promise isn't resolved yet.

The profile in comment 5 is similar.

I don't see anything in these profiles that indicates high CPU usage, and the main thread seems responsive, so I'm surprised you are seeing issues while moving the cursor. We seem to be GC'ing a lot, so it's possible there's some memory pressure, although comment 0 said half the ram was free. It's also very possible that another thread not captured in the profile was very busy and used plenty of CPU.

I would suggest fixing the excessive .then calls, and profiling again.


Now focusing more on the platform code, and especially on the profile from comment 10, I see we spend a lot of time in mozilla::net::BackgroundFileSaverStreamListener::OnDataAvailable, in code that needs to wait for locks. It's unclear to me why this code that just takes incoming data from a stream and saves it on another thread needs to run on the main thread. This seems to be causing plenty of events to go in the main thread's event queue, when this queue should be mostly reserved for handling of user events. mozilla::net::BackgroundFileSaverStreamListener::OnDataAvailable is about 430ms of time spent in the profile from comment 6. I think we'll need advice from someone from the necko team to fix this.

This this analysis isn't enough to answer your questions, I would suggest asking ehsan to have a look at the profiles too.
(In reply to Florian Quèze [:florian] [:flo] from comment #11)
> I would suggest fixing the excessive .then calls, and profiling again.

I'm wondering whether we're spending time creating the function that we pass as a callback, or in the "then" implementation itself. Maybe just moving the function to the object could improve things.

Anyways, setting as P3 given that we have at least one actionable item, despite the general slowness being still unexplained.

I'd be curious to know why the profiles show the time spent in mach_msg_trap, which if I understand correctly is not actually blocking the main thread.
Priority: -- → P3
(In reply to :Paolo Amadini from comment #12)
> (In reply to Florian Quèze [:florian] [:flo] from comment #11)
> > I would suggest fixing the excessive .then calls, and profiling again.
> 
> I'm wondering whether we're spending time creating the function that we pass
> as a callback, or in the "then" implementation itself.

I see at least 418ms in the then implementation for the profile from comment 6.

The .catch call also has a cost; I didn't mention it before because if we optimize away the .then call, we also get rid of the repeated .catch calls.


Tagging for qf triage as slowing down the main thread event loop whenever there's a download in progress feels bad.
Whiteboard: [qf]
(In reply to Florian Quèze [:florian] [:flo] from comment #13)
> I see at least 418ms in the then implementation for the profile from comment 6.

Uh, I've spotted this after reintroducing the C++ stack:

  js::CrossCompartmentWrapper::call

I suspect this might happen because we're using:

  this._deferDownload = PromiseUtils.defer();

So all the calls are cross-compartment because the Promise is created in "PromiseUtils.jsm".

Now that we don't need too much isolation for add-on JSMs, could we consider loading all the browser JSMs in the same compartment like we did for mobile? I couldn't file a bug for that, although bug 807205 seems related, and bug 986503 is an example where that issue was mentioned.

In the meantime we can remove the usage of PromiseUtils.jsm from this code. I'll try and see how much impact this alone has, because it can be relevant for other places in the code as well.

Again, this is probably unrelated to the problem Stefan originally reported, but definitely an important optimization to have.
Flags: needinfo?(florian)
(In reply to :Paolo Amadini from comment #14)
> (In reply to Florian Quèze [:florian] [:flo] from comment #13)
> > I see at least 418ms in the then implementation for the profile from comment 6.
> 
> Uh, I've spotted this after reintroducing the C++ stack:
> 
>   js::CrossCompartmentWrapper::call

The cross compartment wrappers seem to be a few ms of overhead here and there in this profile. I think it would go down if we stopped calling .then thousands of times.

> Now that we don't need too much isolation for add-on JSMs, could we consider
> loading all the browser JSMs in the same compartment like we did for mobile?

That's bug 1186409.
Flags: needinfo?(florian)
(In reply to Florian Quèze [:florian] [:flo] from comment #15)
> The cross compartment wrappers seem to be a few ms of overhead here and
> there in this profile. I think it would go down if we stopped calling .then
> thousands of times.

I just made a simple local test, and just removing the compartment overhead caused the cumulative time of DLT_onProgressChange64 to drop from 36ms to 15ms. As you say we can likely remove the remaining half entirely by avoiding "then", but it's good to know where some of this overhead comes from.

> > Now that we don't need too much isolation for add-on JSMs, could we consider
> > loading all the browser JSMs in the same compartment like we did for mobile?
> 
> That's bug 1186409.

Should this bug be nominated for tracking in Quantum Flow? Based on the above, it would be a notable performance win.
(In reply to :Paolo Amadini from comment #16)

> > > Now that we don't need too much isolation for add-on JSMs, could we consider
> > > loading all the browser JSMs in the same compartment like we did for mobile?
> > 
> > That's bug 1186409.
> 
> Should this bug be nominated for tracking in Quantum Flow? Based on the
> above, it would be a notable performance win.

Andrew is actively working on it and its dependencies, so I don't think tracking it would make any difference at this point. The expected perf win is known to be big :-).
After removing the "then" call I see 11ms of self time in DLT_onProgressChange64, likely because the download object is in a different compartment, however these were already existing and attributed to the lambda function.

Probably bug 1186409 won't help for the latter overhead because one is an XPCOM component and the other is a JSM module, but we already have a visible improvement.
Summary: Downloads cause beachballing and cursor stuttering → Reduce Promise overhead in the DownloadLegacy.js progress events
Stefan, I morphed the bug to reflect the issue Florian identified. When this bug is fixed, if you're still seeing the issue just file a new bug and attach an updated Nightly profile.
Flags: needinfo?(sarentz)
Assignee: nobody → paolo.mozmail
Status: NEW → ASSIGNED
Flags: qe-verify-
Priority: P3 → P1
Comment on attachment 8894873 [details]
Bug 1382899 - Reduce Promise overhead in the DownloadLegacy.js progress events.

https://reviewboard.mozilla.org/r/166034/#review171240

Thanks! :-)

::: toolkit/components/jsdownloads/src/DownloadLegacy.js:205
(Diff revision 1)
> +    this._hasDelayedProgress = true;
> +
> +    this._promiseDownload.then(download => {
> +      // Check whether an immediate progress report has been already processed
> +      // before we could send the delayed progress report.
> +      if (!this._hasDelayedProgress) {

I don't think this test is actually needed (as promise microtasks are executed immediately when a promise is resolved, before we return to the event loop), but it doesn't hurt.
Attachment #8894873 - Flags: review?(florian) → review+
Pushed by paolo.mozmail@amadzone.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/54243811f36e
Reduce Promise overhead in the DownloadLegacy.js progress events. r=florian
Whiteboard: [qf] → [qf:p3]
https://hg.mozilla.org/mozilla-central/rev/54243811f36e
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 57
Wow lots of activity here that I don't really understand :-) I will report back when I still see this issue on Nightly.
Flags: needinfo?(sarentz)
So here is an interesting observation: it still happens in Firefox Nightly and it also happens in Chrome. Maybe this Mac Pro just behaves very badly when it receives 100 Mbit on WiFi? I'll do some more testing and I'll try to find people with similar problems on this hardware.

In Chrome it is so bad that my Magic Trackpad stops working. I does not register taps anymore.
Smooth as butter when downloading the same file with Safari :-/
Flags: needinfo?(florian)
Flags: needinfo?(paolo.mozmail)
Stefan, we decided to use this bug to handle the front-end issue. Now that it's out of the way (or at least should be!) could you please open a new bug with a profile captured on a recent nightly, and cc paolo and me? Thanks!
Flags: needinfo?(florian)
Flags: needinfo?(paolo.mozmail) → needinfo?(sarentz)
Flags: needinfo?(sarentz)
Performance Impact: --- → P3
Whiteboard: [qf:p3]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: