::RegisterDragDrop is expensive during startup

RESOLVED FIXED in Firefox 57

Status

()

Core
Widget: Win32
P1
normal
RESOLVED FIXED
10 months ago
13 days ago

People

(Reporter: florian, Assigned: bytesized)

Tracking

(Blocks: 1 bug)

unspecified
mozilla57
x86_64
Windows 10
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox57 fixed)

Details

(Whiteboard: [qf:p1][tpi:+][fce-active-legacy])

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

(Reporter)

Description

10 months ago
See this startup profile captured on the quantum reference hardware where 42ms are spent in RegisterDragDrop while creating the first window: http://perfht.ml/2pNOwEp

The related code is at http://searchfox.org/mozilla-central/rev/6580dd9a4eb0224773897388dba2ddf5ed7f4216/widget/windows/nsWindow.cpp#3776 and according to the comment above, it's to "Enables drag and drop of files on this widget".

Can this be done either off-main thread, or during idle time once we are done with startup?

Comment 1

10 months ago
(In reply to Florian Quèze [:florian] [:flo] from comment #0)
> Can this be done either off-main thread

That's a non-starter, I'm afraid. HWND + COM = must be on UI thread. We'd have to consider other options.

Comment 2

10 months ago
The thing that is taking all of the time here is loading a DLL.  What we can do is load that DLL earlier on a background thread so that when Windows tries to load it the DLL would already be loaded and we wouldn't need to pay this penalty on the main thread.

Unfortunately we can't easily tell the name of the DLL being loaded from the profile.  I have noticed this problem in a bunch of other profiles, so I decided to fix it.  I posted a patch in bug 1362814.  Once that lands, there will be a pseudo-stack label for these DLL loads with the name of the DLL, and then it would be super easy to reprofile and figure out what DLL needs to be loaded in the background!  :-)

Updated

10 months ago
Priority: -- → P1
Whiteboard: [qf] → [qf][tpi:+]
Whiteboard: [qf][tpi:+] → [qf:p1][tpi:+]
No longer depends on: 1362814
(Reporter)

Comment 3

8 months ago
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #2)

> Unfortunately we can't easily tell the name of the DLL being loaded from the
> profile.  I have noticed this problem in a bunch of other profiles, so I
> decided to fix it.  I posted a patch in bug 1362814.  Once that lands, there
> will be a pseudo-stack label for these DLL loads with the name of the DLL,
> and then it would be super easy to reprofile and figure out what DLL needs
> to be loaded in the background!  :-)

We are loading dataexchange.dll, as seen on this profile: https://perfht.ml/2sQ1RKl

Other ideas to consider:
- Could we completely avoid this RegisterDragDrop call when it's the hidden window that we are creating?
- Could we do this for actual browser windows when we are done with startup from an idle callback? Hopefully this should give us enough time to preload the dll file off main thread.
Assignee: nobody → jiangperry
Status: NEW → ASSIGNED

Updated

7 months ago
Whiteboard: [qf:p1][tpi:+] → [qf:p1][tpi:+][fce-active]
Assignee: jiangperry → agashlin

Comment 4

7 months ago
(In reply to Florian Quèze [:florian] [:flo] from comment #3)
> (In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from
> comment #2)
> 
> > Unfortunately we can't easily tell the name of the DLL being loaded from the
> > profile.  I have noticed this problem in a bunch of other profiles, so I
> > decided to fix it.  I posted a patch in bug 1362814.  Once that lands, there
> > will be a pseudo-stack label for these DLL loads with the name of the DLL,
> > and then it would be super easy to reprofile and figure out what DLL needs
> > to be loaded in the background!  :-)
> 
> We are loading dataexchange.dll, as seen on this profile:
> https://perfht.ml/2sQ1RKl
> 
> Other ideas to consider:
> - Could we completely avoid this RegisterDragDrop call when it's the hidden
> window that we are creating?

I think that would move this cost to where we create the first browser window.

> - Could we do this for actual browser windows when we are done with startup
> from an idle callback? Hopefully this should give us enough time to preload
> the dll file off main thread.

If we do preload the DLL from a background thread early enough, it shouldn't matter where on the main thread we load this for the first time from IINM.
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #4)
> > - Could we do this for actual browser windows when we are done with startup
> > from an idle callback? Hopefully this should give us enough time to preload
> > the dll file off main thread.
> 
> If we do preload the DLL from a background thread early enough, it shouldn't
> matter where on the main thread we load this for the first time from IINM.

So it looks like we're close to having this mechanism, but we're looking for what would be the right place to start off this early-enough preloading. Does anyone have any suggestions on where we could hook this up during the startup process? 

My not-very-knowledgeable suggestion would be to do it on an observer for profile-after-change or final-ui-startup, or directly triggered from XRE_main or XRE_mainRun. Anyone who has worked closely on this has an opinion on these suggestions, or a better one?
Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(erahm)
Flags: needinfo?(ehsan)

Comment 6

7 months ago
There is some active discussion ongoing about this in bug 1367416.  There are probably at least 10 bugs on file for this general issue, just manifesting in different DLLs, I wish we had been more proactive in having a tracker bug for them.  :/  Jim has expressed interest in driving to have a solution for handling the general problem.
Flags: needinfo?(ehsan)
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #6)
> There is some active discussion ongoing about this in bug 1367416.  There
> are probably at least 10 bugs on file for this general issue, just
> manifesting in different DLLs, I wish we had been more proactive in having a
> tracker bug for them.  :/  Jim has expressed interest in driving to have a
> solution for handling the general problem.

Jim, did you ever file a tracking bug for this work? If not can you add one with more details on the proposed implementation so we can start blocking bugs on it? I'm happy to help out.
Flags: needinfo?(erahm) → needinfo?(jmathies)

Comment 8

7 months ago
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #7)
> (In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from
> comment #6)
> > There is some active discussion ongoing about this in bug 1367416.  There
> > are probably at least 10 bugs on file for this general issue, just
> > manifesting in different DLLs, I wish we had been more proactive in having a
> > tracker bug for them.  :/  Jim has expressed interest in driving to have a
> > solution for handling the general problem.
> 
> Jim, did you ever file a tracking bug for this work? If not can you add one
> with more details on the proposed implementation so we can start blocking
> bugs on it? I'm happy to help out.

Looks like this work is taking place in bug 1367416.
Flags: needinfo?(jmathies)
Depends on: 1367416
I looked into adding DataExchange.dll to the preloading I was doing in bug 1367416, it looks like I also need to include twinapi.appcore.dll and sechost.dll to make a significant dent in the time under EnableDragDrop. I also made a change to not EnableDragDrop if the window is invisible, thus the first enabling happens later giving more time for the read ahead. With these changes the average time under EnableDragDrop goes from 20ms to 4ms on simulated cold boot on reference hw.

I'm not sure what to do with the patch on bug 1367416, my plan right now is to wait for that to land and then submit a patch for this adding the dll and the invisibility check.
To follow up, I haven't been able to demonstrate consistent improvement with the changes I discussed, during cold start RegisterDragDrop/EnableDragDrop is still showing up strongly in a profile. This isn't due to the read ahead racing the real read, as I tried making the read ahead fully synchronous and the same issue still arises. So my latest patch on bug 1367416 doesn't seem like the right solution for this current bug, though it seems to handle that and bug 1360167 well.

Comment 11

6 months ago
(In reply to Adam Gashlin [:agashlin] from comment #10)
> To follow up, I haven't been able to demonstrate consistent improvement with
> the changes I discussed, during cold start RegisterDragDrop/EnableDragDrop
> is still showing up strongly in a profile. This isn't due to the read ahead
> racing the real read, as I tried making the read ahead fully synchronous and
> the same issue still arises. So my latest patch on bug 1367416 doesn't seem
> like the right solution for this current bug, though it seems to handle that
> and bug 1360167 well.

Do you have a profile link that shows us where we spend the rest of our time after your patch by any chance?  Thanks!
Here's a simulated cold start, 14ms: https://perfht.ml/2uL9Toy
and a real cold start 8ms: https://perfht.ml/2uLfviV

I think I was getting worse numbers before, but with generally the same shape.
This is with a very aggressive patch that loads all of these synchronously before the window is created: DataExchange.dll, dcomp.dll, d3d11.dll, dxgi.dll, twinapi.appcore.dll, sechost.dll

Updated

6 months ago
Flags: needinfo?(kmaglione+bmo)
Assignee: agashlin → nobody
Status: ASSIGNED → NEW

Comment 13

6 months ago
(In reply to Adam Gashlin [:agashlin] from comment #12)
> Here's a simulated cold start, 14ms: https://perfht.ml/2uL9Toy
> and a real cold start 8ms: https://perfht.ml/2uLfviV
> 
> I think I was getting worse numbers before, but with generally the same
> shape.
> This is with a very aggressive patch that loads all of these synchronously
> before the window is created: DataExchange.dll, dcomp.dll, d3d11.dll,
> dxgi.dll, twinapi.appcore.dll, sechost.dll

Fantastic, really impressive work!  There's nothing else that come to my eyes personally which can be shaved off here...
I'm afraid I may have been unclear, those numbers are from doing the read ahead on the main thread (to avoid any chance of the LoadLibrary outracing the read), and with I think an unrealistic collection of DLLs to load here. It was intended to illustrate that something other than straight IO is taking time in those loads, so just looking at profile times isn't indicating as much as we'd like. I've seen situations where adding more files to read ahead actually seems to increase the time spent in RegisterDragDrop. 

Looking at Process Monitor logs seems to show that some delay loaded dependencies show up read first on other threads after the initial load of DataExchange.dll, is it possible that those are able to run ahead earlier (due to read ahead) and get locks that interfere with the main thread?

With so much uncertain I feel like I don't have a good patch for this bug yet, which is why I unassigned. Any suggestions for next steps?
Flags: needinfo?(ehsan)

Comment 15

6 months ago
(In reply to Adam Gashlin [:agashlin] from comment #14)
> I'm afraid I may have been unclear, those numbers are from doing the read
> ahead on the main thread (to avoid any chance of the LoadLibrary outracing
> the read), and with I think an unrealistic collection of DLLs to load here.
> It was intended to illustrate that something other than straight IO is
> taking time in those loads, so just looking at profile times isn't
> indicating as much as we'd like. I've seen situations where adding more
> files to read ahead actually seems to increase the time spent in
> RegisterDragDrop. 

Oh, I see.  Sorry for misunderstanding earlier.

> Looking at Process Monitor logs seems to show that some delay loaded
> dependencies show up read first on other threads after the initial load of
> DataExchange.dll, is it possible that those are able to run ahead earlier
> (due to read ahead) and get locks that interfere with the main thread?

No, if I understand your question correctly, that can't happen.  Our readahead code doesn't examine the list of DLL imports, and is therefore unaware of any delay loaded DLL imports of DataExchange.dll.  If we want to ensure the same optimization applies to those DLLs as well, we'd need to readahead those dependencies explicitly as well.

> With so much uncertain I feel like I don't have a good patch for this bug
> yet, which is why I unassigned. Any suggestions for next steps?

To some extent the nature of this optimization is fundamentally unpredictable, so I certainly understand your feeling here!

I think a good next step would be remeasuring this on trunk to see where we are standing now and get a sense of whether there is anything more we could be doing here.  The thing is, as far as I can tell, we can't call RegisterDragDrop() itself off the main thread, so the Win32 API ties our hands in terms of viable options.

Do you have access to the reference hardware to get a new profile from this on trunk?  If not, can you please help remeasure this Florian?

Thanks again!
Flags: needinfo?(ehsan)
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #15)
> > Looking at Process Monitor logs seems to show that some delay loaded
> > dependencies show up read first on other threads after the initial load of
> > DataExchange.dll, is it possible that those are able to run ahead earlier
> > (due to read ahead) and get locks that interfere with the main thread?
> 
> No, if I understand your question correctly, that can't happen.  Our
> readahead code doesn't examine the list of DLL imports, and is therefore
> unaware of any delay loaded DLL imports of DataExchange.dll.  If we want to
> ensure the same optimization applies to those DLLs as well, we'd need to
> readahead those dependencies explicitly as well.

When DataExchange.dll is loaded, without any contribution from readahead other DLLs get read from other threads, I've seen this in Process Monitor where dxgi.dll and d3d11.dll at least are each on their own thread. This must be something Windows is doing internally, probably this is by design to allow the thread loading DataExchange.dll to proceed? So readahead of those is probably counterproductive; I was suggesting that there might be a lock those other threads are getting to earlier than normal if we readahead, blocking the main thread, but that's speculation.

> Do you have access to the reference hardware to get a new profile from this
> on trunk?  If not, can you please help remeasure this Florian?

I do have ref hw, I ran a few cold start tests with current Nightly (32-bit, the stacks seem to be more consistent that way), both normally (with the new readahead in place) and with XRE_NO_DLL_READAHEAD set. Under RegisterDragDrop I got median 5 normally vs median 51 with XRE_NO_DLL_READAHEAD.

However there were a few as long as 54ms, here's a profile where RegisterDragDrop took 44ms: https://perfht.ml/2v5MSwN

I noticed in procmon that the DataExchage.dll read seems to often happen during a xul.dll load from another process, this unfortunate timing could be part of the problem. Also something I didn't test thoroughly was the independent effect of not doing EnableDragDrop on an invisible window (by wrapping http://searchfox.org/mozilla-central/rev/5ce320a16f6998dc2b36e30458131d029d107077/view/nsView.cpp#719 in if (mWindow->WindowType() != eWindowType_invisible)), pushing that a bit later could help.
(Assignee)

Updated

6 months ago
Assignee: nobody → ksteuber
(Assignee)

Comment 17

6 months ago
In Comment 12 :agashlin established that significant time is spent in RegisterDragDrop even if the necessary dlls are synchronously preloaded. Therefore, I think that the best course of action here is not to try to shorten the amount of time that it takes (since I see no path forward on that front), but to instead prevent that time from delaying startup by moving the work to idle time.

I want to try this in conjunction with skipping the call altogether for hidden windows.

I talked to :ehsan about this, and he seems supportive as long as RegisterDragDrop continues to be called "in time". We can easily use the the timeout argument to NS_IdleDispatchToCurrentThread, but the question seems to be what an appropriate timeout would be.

@jimm Does this seem like a good approach? Can you suggest a reasonable value for the timeout?
Flags: needinfo?(jmathies)

Comment 18

6 months ago
(In reply to Kirk Steuber [:bytesized] from comment #17)
> In Comment 12 :agashlin established that significant time is spent in
> RegisterDragDrop even if the necessary dlls are synchronously preloaded.
> Therefore, I think that the best course of action here is not to try to
> shorten the amount of time that it takes (since I see no path forward on
> that front), but to instead prevent that time from delaying startup by
> moving the work to idle time.
> 
> I want to try this in conjunction with skipping the call altogether for
> hidden windows.
> 
> I talked to :ehsan about this, and he seems supportive as long as
> RegisterDragDrop continues to be called "in time". We can easily use the the
> timeout argument to NS_IdleDispatchToCurrentThread, but the question seems
> to be what an appropriate timeout would be.
> 
> @jimm Does this seem like a good approach? Can you suggest a reasonable
> value for the timeout?

Seems a bit error prone if the timeout is long. A short timeout might not give you the perf win you're looking for. I would say you could delay calling that api for a top level window maybe 500msec or less after the show state of the window changes.
Flags: needinfo?(jmathies)
(Assignee)

Comment 19

6 months ago
@jimm I am not sure how to use a timeout that is based on window show state. However, I feel like it would be safe and beneficial to set the timeout to 1000ms. When I tried using a 1s timeout, I found that the time spent in RegisterDragDrop fell to 3ms or less (often not showing up in profiles at all). Example profile (from reference machine): https://perfht.ml/2x7KafN

I also tried a timeout of 500ms, which resulted in times spent in RegisterDragDrop of 10-35ms. 

In both cases, however, the profiler shows that the time is spent mid-startup, which seems to me like an indication that it is safe to give it a timeout of this length.

What do you think?
Flags: needinfo?(jmathies)
Comment hidden (mozreview-request)

Comment 21

6 months ago
(In reply to Kirk Steuber [:bytesized] from comment #19)
> @jimm I am not sure how to use a timeout that is based on window show state.
> However, I feel like it would be safe and beneficial to set the timeout to
> 1000ms. When I tried using a 1s timeout, I found that the time spent in
> RegisterDragDrop fell to 3ms or less (often not showing up in profiles at
> all). Example profile (from reference machine): https://perfht.ml/2x7KafN
> 
> I also tried a timeout of 500ms, which resulted in times spent in
> RegisterDragDrop of 10-35ms. 
> 
> In both cases, however, the profiler shows that the time is spent
> mid-startup, which seems to me like an indication that it is safe to give it
> a timeout of this length.
> 
> What do you think?

1 second sounds find. My numbers were just guesses, use what appears to have the best impact.
Flags: needinfo?(jmathies)
(Reporter)

Comment 22

6 months ago
(In reply to Kirk Steuber [:bytesized] from comment #19)

> In both cases, however, the profiler shows that the time is spent
> mid-startup, which seems to me like an indication that it is safe to give it
> a timeout of this length.
> 
> What do you think?

If the time is spent mid-startup, isn't that an indication that the timeout isn't long enough? A startup time > 5s isn't uncommon for cold startup. I guess it also depends what you called "mid-startup", it's really different if we are talking about 'in the middle of the busy work done at startup' or 'in the middle of the other idle callbacks that run at the end of startup'.
(Reporter)

Comment 23

6 months ago
With EnableDragDrop now being called asynchronously, should the assertion at http://searchfox.org/mozilla-central/rev/4d8e389498a08668cce9ebf6232cc96be178c3e4/widget/windows/nsWindow.cpp#3775 become a normal early return to handle the case where we closed the window before the idle callback fired? (this seems likely to happen a lot in automated tests). And by the way, thanks for working on this bug! :-)
(Assignee)

Comment 24

6 months ago
(In reply to Florian Quèze [:florian] [:flo] from comment #22)
> If the time is spent mid-startup, isn't that an indication that the timeout
> isn't long enough?

I am similarly unsure about this. Originally, the plan was to move some work that I couldn't reduce to a time when it did not matter. But when I move it out by 1sec, the work time seems to reduce, solving the problem. Since the time spent in RegisterDragDrop is spent in NtWaitForSingleObject, I can only assume that 1 second later the object is ready. This isn't very scientific, but it seems to work so I'd like to do it.

(In reply to Florian Quèze [:florian] [:flo] from comment #23)
> With EnableDragDrop now being called asynchronously, should the assertion at
> http://searchfox.org/mozilla-central/rev/
> 4d8e389498a08668cce9ebf6232cc96be178c3e4/widget/windows/nsWindow.cpp#3775
> become a normal early return to handle the case where we closed the window
> before the idle callback fired?

Yeah, I believe you are correct. I am making that change now.
Comment hidden (mozreview-request)
(Reporter)

Comment 26

6 months ago
(In reply to Kirk Steuber [:bytesized] from comment #24)
> (In reply to Florian Quèze [:florian] [:flo] from comment #22)
> > If the time is spent mid-startup, isn't that an indication that the timeout
> > isn't long enough?
> 
> I am similarly unsure about this. Originally, the plan was to move some work
> that I couldn't reduce to a time when it did not matter. But when I move it
> out by 1sec, the work time seems to reduce, solving the problem. Since the
> time spent in RegisterDragDrop is spent in NtWaitForSingleObject, I can only
> assume that 1 second later the object is ready. This isn't very scientific,
> but it seems to work so I'd like to do it.

Is this 1s on a high-end developer machine, or on the quantum reference hardware? I'm concerned that this '1s' magic value may need to be significantly larger on older hardware. I'm ok with the "not very scientific" part though :-).
(Assignee)

Comment 27

6 months ago
This is on quantum reference hardware.
Comment hidden (mozreview-request)
(Assignee)

Updated

6 months ago
Attachment #8904666 - Flags: review?(jmathies)

Comment 29

6 months ago
mozreview-review
Comment on attachment 8904666 [details]
Bug 1362382 - Move RegisterDragDrop to be called during idle time, if possible

https://reviewboard.mozilla.org/r/176470/#review182348

::: widget/nsBaseWidget.cpp:2241
(Diff revision 3)
> +  return NS_IdleDispatchToCurrentThread(
> +    NS_NewRunnableFunction("AsyncEnableDragDropFn",
> +                           [this, aEnable, kungFuDeathGrip]() {
> +                             EnableDragDrop(aEnable);
> +                           }),
> +    1000);

lets move this value to a const outside the method with a comment explaining what it is.
Attachment #8904666 - Flags: review?(jmathies) → review+
Comment hidden (mozreview-request)

Comment 31

6 months ago
Pushed by ksteuber@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6aa682399e1e
Move RegisterDragDrop to be called during idle time, if possible r=jimm
https://hg.mozilla.org/mozilla-central/rev/6aa682399e1e
Status: NEW → RESOLVED
Last Resolved: 6 months ago
status-firefox57: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57

Updated

13 days ago
Whiteboard: [qf:p1][tpi:+][fce-active] → [qf:p1][tpi:+][fce-active-legacy]
You need to log in before you can comment on or make changes to this bug.