Closed Bug 1362382 Opened 7 years ago Closed 7 years ago

::RegisterDragDrop is expensive during startup

Categories

(Core :: Widget: Win32, enhancement, P1)

x86_64
Windows 10
enhancement

Tracking

()

RESOLVED FIXED
mozilla57
Performance Impact high
Tracking Status
firefox57 --- fixed

People

(Reporter: florian, Assigned: bytesized)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [tpi:+][fce-active-legacy])

Attachments

(1 file)

See this startup profile captured on the quantum reference hardware where 42ms are spent in RegisterDragDrop while creating the first window: http://perfht.ml/2pNOwEp

The related code is at http://searchfox.org/mozilla-central/rev/6580dd9a4eb0224773897388dba2ddf5ed7f4216/widget/windows/nsWindow.cpp#3776 and according to the comment above, it's to "Enables drag and drop of files on this widget".

Can this be done either off-main thread, or during idle time once we are done with startup?
(In reply to Florian Quèze [:florian] [:flo] from comment #0)
> Can this be done either off-main thread

That's a non-starter, I'm afraid. HWND + COM = must be on UI thread. We'd have to consider other options.
The thing that is taking all of the time here is loading a DLL.  What we can do is load that DLL earlier on a background thread so that when Windows tries to load it the DLL would already be loaded and we wouldn't need to pay this penalty on the main thread.

Unfortunately we can't easily tell the name of the DLL being loaded from the profile.  I have noticed this problem in a bunch of other profiles, so I decided to fix it.  I posted a patch in bug 1362814.  Once that lands, there will be a pseudo-stack label for these DLL loads with the name of the DLL, and then it would be super easy to reprofile and figure out what DLL needs to be loaded in the background!  :-)
Priority: -- → P1
Whiteboard: [qf] → [qf][tpi:+]
Whiteboard: [qf][tpi:+] → [qf:p1][tpi:+]
No longer depends on: 1362814
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #2)

> Unfortunately we can't easily tell the name of the DLL being loaded from the
> profile.  I have noticed this problem in a bunch of other profiles, so I
> decided to fix it.  I posted a patch in bug 1362814.  Once that lands, there
> will be a pseudo-stack label for these DLL loads with the name of the DLL,
> and then it would be super easy to reprofile and figure out what DLL needs
> to be loaded in the background!  :-)

We are loading dataexchange.dll, as seen on this profile: https://perfht.ml/2sQ1RKl

Other ideas to consider:
- Could we completely avoid this RegisterDragDrop call when it's the hidden window that we are creating?
- Could we do this for actual browser windows when we are done with startup from an idle callback? Hopefully this should give us enough time to preload the dll file off main thread.
Assignee: nobody → jiangperry
Status: NEW → ASSIGNED
Whiteboard: [qf:p1][tpi:+] → [qf:p1][tpi:+][fce-active]
Assignee: jiangperry → agashlin
(In reply to Florian Quèze [:florian] [:flo] from comment #3)
> (In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from
> comment #2)
> 
> > Unfortunately we can't easily tell the name of the DLL being loaded from the
> > profile.  I have noticed this problem in a bunch of other profiles, so I
> > decided to fix it.  I posted a patch in bug 1362814.  Once that lands, there
> > will be a pseudo-stack label for these DLL loads with the name of the DLL,
> > and then it would be super easy to reprofile and figure out what DLL needs
> > to be loaded in the background!  :-)
> 
> We are loading dataexchange.dll, as seen on this profile:
> https://perfht.ml/2sQ1RKl
> 
> Other ideas to consider:
> - Could we completely avoid this RegisterDragDrop call when it's the hidden
> window that we are creating?

I think that would move this cost to where we create the first browser window.

> - Could we do this for actual browser windows when we are done with startup
> from an idle callback? Hopefully this should give us enough time to preload
> the dll file off main thread.

If we do preload the DLL from a background thread early enough, it shouldn't matter where on the main thread we load this for the first time from IINM.
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #4)
> > - Could we do this for actual browser windows when we are done with startup
> > from an idle callback? Hopefully this should give us enough time to preload
> > the dll file off main thread.
> 
> If we do preload the DLL from a background thread early enough, it shouldn't
> matter where on the main thread we load this for the first time from IINM.

So it looks like we're close to having this mechanism, but we're looking for what would be the right place to start off this early-enough preloading. Does anyone have any suggestions on where we could hook this up during the startup process? 

My not-very-knowledgeable suggestion would be to do it on an observer for profile-after-change or final-ui-startup, or directly triggered from XRE_main or XRE_mainRun. Anyone who has worked closely on this has an opinion on these suggestions, or a better one?
Flags: needinfo?(kmaglione+bmo)
Flags: needinfo?(erahm)
Flags: needinfo?(ehsan)
There is some active discussion ongoing about this in bug 1367416.  There are probably at least 10 bugs on file for this general issue, just manifesting in different DLLs, I wish we had been more proactive in having a tracker bug for them.  :/  Jim has expressed interest in driving to have a solution for handling the general problem.
Flags: needinfo?(ehsan)
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #6)
> There is some active discussion ongoing about this in bug 1367416.  There
> are probably at least 10 bugs on file for this general issue, just
> manifesting in different DLLs, I wish we had been more proactive in having a
> tracker bug for them.  :/  Jim has expressed interest in driving to have a
> solution for handling the general problem.

Jim, did you ever file a tracking bug for this work? If not can you add one with more details on the proposed implementation so we can start blocking bugs on it? I'm happy to help out.
Flags: needinfo?(erahm) → needinfo?(jmathies)
(In reply to Eric Rahm [:erahm] (please no mozreview requests) from comment #7)
> (In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from
> comment #6)
> > There is some active discussion ongoing about this in bug 1367416.  There
> > are probably at least 10 bugs on file for this general issue, just
> > manifesting in different DLLs, I wish we had been more proactive in having a
> > tracker bug for them.  :/  Jim has expressed interest in driving to have a
> > solution for handling the general problem.
> 
> Jim, did you ever file a tracking bug for this work? If not can you add one
> with more details on the proposed implementation so we can start blocking
> bugs on it? I'm happy to help out.

Looks like this work is taking place in bug 1367416.
Flags: needinfo?(jmathies)
I looked into adding DataExchange.dll to the preloading I was doing in bug 1367416, it looks like I also need to include twinapi.appcore.dll and sechost.dll to make a significant dent in the time under EnableDragDrop. I also made a change to not EnableDragDrop if the window is invisible, thus the first enabling happens later giving more time for the read ahead. With these changes the average time under EnableDragDrop goes from 20ms to 4ms on simulated cold boot on reference hw.

I'm not sure what to do with the patch on bug 1367416, my plan right now is to wait for that to land and then submit a patch for this adding the dll and the invisibility check.
To follow up, I haven't been able to demonstrate consistent improvement with the changes I discussed, during cold start RegisterDragDrop/EnableDragDrop is still showing up strongly in a profile. This isn't due to the read ahead racing the real read, as I tried making the read ahead fully synchronous and the same issue still arises. So my latest patch on bug 1367416 doesn't seem like the right solution for this current bug, though it seems to handle that and bug 1360167 well.
(In reply to Adam Gashlin [:agashlin] from comment #10)
> To follow up, I haven't been able to demonstrate consistent improvement with
> the changes I discussed, during cold start RegisterDragDrop/EnableDragDrop
> is still showing up strongly in a profile. This isn't due to the read ahead
> racing the real read, as I tried making the read ahead fully synchronous and
> the same issue still arises. So my latest patch on bug 1367416 doesn't seem
> like the right solution for this current bug, though it seems to handle that
> and bug 1360167 well.

Do you have a profile link that shows us where we spend the rest of our time after your patch by any chance?  Thanks!
Here's a simulated cold start, 14ms: https://perfht.ml/2uL9Toy
and a real cold start 8ms: https://perfht.ml/2uLfviV

I think I was getting worse numbers before, but with generally the same shape.
This is with a very aggressive patch that loads all of these synchronously before the window is created: DataExchange.dll, dcomp.dll, d3d11.dll, dxgi.dll, twinapi.appcore.dll, sechost.dll
Flags: needinfo?(kmaglione+bmo)
Assignee: agashlin → nobody
Status: ASSIGNED → NEW
(In reply to Adam Gashlin [:agashlin] from comment #12)
> Here's a simulated cold start, 14ms: https://perfht.ml/2uL9Toy
> and a real cold start 8ms: https://perfht.ml/2uLfviV
> 
> I think I was getting worse numbers before, but with generally the same
> shape.
> This is with a very aggressive patch that loads all of these synchronously
> before the window is created: DataExchange.dll, dcomp.dll, d3d11.dll,
> dxgi.dll, twinapi.appcore.dll, sechost.dll

Fantastic, really impressive work!  There's nothing else that come to my eyes personally which can be shaved off here...
I'm afraid I may have been unclear, those numbers are from doing the read ahead on the main thread (to avoid any chance of the LoadLibrary outracing the read), and with I think an unrealistic collection of DLLs to load here. It was intended to illustrate that something other than straight IO is taking time in those loads, so just looking at profile times isn't indicating as much as we'd like. I've seen situations where adding more files to read ahead actually seems to increase the time spent in RegisterDragDrop. 

Looking at Process Monitor logs seems to show that some delay loaded dependencies show up read first on other threads after the initial load of DataExchange.dll, is it possible that those are able to run ahead earlier (due to read ahead) and get locks that interfere with the main thread?

With so much uncertain I feel like I don't have a good patch for this bug yet, which is why I unassigned. Any suggestions for next steps?
Flags: needinfo?(ehsan)
(In reply to Adam Gashlin [:agashlin] from comment #14)
> I'm afraid I may have been unclear, those numbers are from doing the read
> ahead on the main thread (to avoid any chance of the LoadLibrary outracing
> the read), and with I think an unrealistic collection of DLLs to load here.
> It was intended to illustrate that something other than straight IO is
> taking time in those loads, so just looking at profile times isn't
> indicating as much as we'd like. I've seen situations where adding more
> files to read ahead actually seems to increase the time spent in
> RegisterDragDrop. 

Oh, I see.  Sorry for misunderstanding earlier.

> Looking at Process Monitor logs seems to show that some delay loaded
> dependencies show up read first on other threads after the initial load of
> DataExchange.dll, is it possible that those are able to run ahead earlier
> (due to read ahead) and get locks that interfere with the main thread?

No, if I understand your question correctly, that can't happen.  Our readahead code doesn't examine the list of DLL imports, and is therefore unaware of any delay loaded DLL imports of DataExchange.dll.  If we want to ensure the same optimization applies to those DLLs as well, we'd need to readahead those dependencies explicitly as well.

> With so much uncertain I feel like I don't have a good patch for this bug
> yet, which is why I unassigned. Any suggestions for next steps?

To some extent the nature of this optimization is fundamentally unpredictable, so I certainly understand your feeling here!

I think a good next step would be remeasuring this on trunk to see where we are standing now and get a sense of whether there is anything more we could be doing here.  The thing is, as far as I can tell, we can't call RegisterDragDrop() itself off the main thread, so the Win32 API ties our hands in terms of viable options.

Do you have access to the reference hardware to get a new profile from this on trunk?  If not, can you please help remeasure this Florian?

Thanks again!
Flags: needinfo?(ehsan)
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #15)
> > Looking at Process Monitor logs seems to show that some delay loaded
> > dependencies show up read first on other threads after the initial load of
> > DataExchange.dll, is it possible that those are able to run ahead earlier
> > (due to read ahead) and get locks that interfere with the main thread?
> 
> No, if I understand your question correctly, that can't happen.  Our
> readahead code doesn't examine the list of DLL imports, and is therefore
> unaware of any delay loaded DLL imports of DataExchange.dll.  If we want to
> ensure the same optimization applies to those DLLs as well, we'd need to
> readahead those dependencies explicitly as well.

When DataExchange.dll is loaded, without any contribution from readahead other DLLs get read from other threads, I've seen this in Process Monitor where dxgi.dll and d3d11.dll at least are each on their own thread. This must be something Windows is doing internally, probably this is by design to allow the thread loading DataExchange.dll to proceed? So readahead of those is probably counterproductive; I was suggesting that there might be a lock those other threads are getting to earlier than normal if we readahead, blocking the main thread, but that's speculation.

> Do you have access to the reference hardware to get a new profile from this
> on trunk?  If not, can you please help remeasure this Florian?

I do have ref hw, I ran a few cold start tests with current Nightly (32-bit, the stacks seem to be more consistent that way), both normally (with the new readahead in place) and with XRE_NO_DLL_READAHEAD set. Under RegisterDragDrop I got median 5 normally vs median 51 with XRE_NO_DLL_READAHEAD.

However there were a few as long as 54ms, here's a profile where RegisterDragDrop took 44ms: https://perfht.ml/2v5MSwN

I noticed in procmon that the DataExchage.dll read seems to often happen during a xul.dll load from another process, this unfortunate timing could be part of the problem. Also something I didn't test thoroughly was the independent effect of not doing EnableDragDrop on an invisible window (by wrapping http://searchfox.org/mozilla-central/rev/5ce320a16f6998dc2b36e30458131d029d107077/view/nsView.cpp#719 in if (mWindow->WindowType() != eWindowType_invisible)), pushing that a bit later could help.
Assignee: nobody → ksteuber
In Comment 12 :agashlin established that significant time is spent in RegisterDragDrop even if the necessary dlls are synchronously preloaded. Therefore, I think that the best course of action here is not to try to shorten the amount of time that it takes (since I see no path forward on that front), but to instead prevent that time from delaying startup by moving the work to idle time.

I want to try this in conjunction with skipping the call altogether for hidden windows.

I talked to :ehsan about this, and he seems supportive as long as RegisterDragDrop continues to be called "in time". We can easily use the the timeout argument to NS_IdleDispatchToCurrentThread, but the question seems to be what an appropriate timeout would be.

@jimm Does this seem like a good approach? Can you suggest a reasonable value for the timeout?
Flags: needinfo?(jmathies)
(In reply to Kirk Steuber [:bytesized] from comment #17)
> In Comment 12 :agashlin established that significant time is spent in
> RegisterDragDrop even if the necessary dlls are synchronously preloaded.
> Therefore, I think that the best course of action here is not to try to
> shorten the amount of time that it takes (since I see no path forward on
> that front), but to instead prevent that time from delaying startup by
> moving the work to idle time.
> 
> I want to try this in conjunction with skipping the call altogether for
> hidden windows.
> 
> I talked to :ehsan about this, and he seems supportive as long as
> RegisterDragDrop continues to be called "in time". We can easily use the the
> timeout argument to NS_IdleDispatchToCurrentThread, but the question seems
> to be what an appropriate timeout would be.
> 
> @jimm Does this seem like a good approach? Can you suggest a reasonable
> value for the timeout?

Seems a bit error prone if the timeout is long. A short timeout might not give you the perf win you're looking for. I would say you could delay calling that api for a top level window maybe 500msec or less after the show state of the window changes.
Flags: needinfo?(jmathies)
@jimm I am not sure how to use a timeout that is based on window show state. However, I feel like it would be safe and beneficial to set the timeout to 1000ms. When I tried using a 1s timeout, I found that the time spent in RegisterDragDrop fell to 3ms or less (often not showing up in profiles at all). Example profile (from reference machine): https://perfht.ml/2x7KafN

I also tried a timeout of 500ms, which resulted in times spent in RegisterDragDrop of 10-35ms. 

In both cases, however, the profiler shows that the time is spent mid-startup, which seems to me like an indication that it is safe to give it a timeout of this length.

What do you think?
Flags: needinfo?(jmathies)
(In reply to Kirk Steuber [:bytesized] from comment #19)
> @jimm I am not sure how to use a timeout that is based on window show state.
> However, I feel like it would be safe and beneficial to set the timeout to
> 1000ms. When I tried using a 1s timeout, I found that the time spent in
> RegisterDragDrop fell to 3ms or less (often not showing up in profiles at
> all). Example profile (from reference machine): https://perfht.ml/2x7KafN
> 
> I also tried a timeout of 500ms, which resulted in times spent in
> RegisterDragDrop of 10-35ms. 
> 
> In both cases, however, the profiler shows that the time is spent
> mid-startup, which seems to me like an indication that it is safe to give it
> a timeout of this length.
> 
> What do you think?

1 second sounds find. My numbers were just guesses, use what appears to have the best impact.
Flags: needinfo?(jmathies)
(In reply to Kirk Steuber [:bytesized] from comment #19)

> In both cases, however, the profiler shows that the time is spent
> mid-startup, which seems to me like an indication that it is safe to give it
> a timeout of this length.
> 
> What do you think?

If the time is spent mid-startup, isn't that an indication that the timeout isn't long enough? A startup time > 5s isn't uncommon for cold startup. I guess it also depends what you called "mid-startup", it's really different if we are talking about 'in the middle of the busy work done at startup' or 'in the middle of the other idle callbacks that run at the end of startup'.
With EnableDragDrop now being called asynchronously, should the assertion at http://searchfox.org/mozilla-central/rev/4d8e389498a08668cce9ebf6232cc96be178c3e4/widget/windows/nsWindow.cpp#3775 become a normal early return to handle the case where we closed the window before the idle callback fired? (this seems likely to happen a lot in automated tests). And by the way, thanks for working on this bug! :-)
(In reply to Florian Quèze [:florian] [:flo] from comment #22)
> If the time is spent mid-startup, isn't that an indication that the timeout
> isn't long enough?

I am similarly unsure about this. Originally, the plan was to move some work that I couldn't reduce to a time when it did not matter. But when I move it out by 1sec, the work time seems to reduce, solving the problem. Since the time spent in RegisterDragDrop is spent in NtWaitForSingleObject, I can only assume that 1 second later the object is ready. This isn't very scientific, but it seems to work so I'd like to do it.

(In reply to Florian Quèze [:florian] [:flo] from comment #23)
> With EnableDragDrop now being called asynchronously, should the assertion at
> http://searchfox.org/mozilla-central/rev/
> 4d8e389498a08668cce9ebf6232cc96be178c3e4/widget/windows/nsWindow.cpp#3775
> become a normal early return to handle the case where we closed the window
> before the idle callback fired?

Yeah, I believe you are correct. I am making that change now.
(In reply to Kirk Steuber [:bytesized] from comment #24)
> (In reply to Florian Quèze [:florian] [:flo] from comment #22)
> > If the time is spent mid-startup, isn't that an indication that the timeout
> > isn't long enough?
> 
> I am similarly unsure about this. Originally, the plan was to move some work
> that I couldn't reduce to a time when it did not matter. But when I move it
> out by 1sec, the work time seems to reduce, solving the problem. Since the
> time spent in RegisterDragDrop is spent in NtWaitForSingleObject, I can only
> assume that 1 second later the object is ready. This isn't very scientific,
> but it seems to work so I'd like to do it.

Is this 1s on a high-end developer machine, or on the quantum reference hardware? I'm concerned that this '1s' magic value may need to be significantly larger on older hardware. I'm ok with the "not very scientific" part though :-).
This is on quantum reference hardware.
Attachment #8904666 - Flags: review?(jmathies)
Comment on attachment 8904666 [details]
Bug 1362382 - Move RegisterDragDrop to be called during idle time, if possible

https://reviewboard.mozilla.org/r/176470/#review182348

::: widget/nsBaseWidget.cpp:2241
(Diff revision 3)
> +  return NS_IdleDispatchToCurrentThread(
> +    NS_NewRunnableFunction("AsyncEnableDragDropFn",
> +                           [this, aEnable, kungFuDeathGrip]() {
> +                             EnableDragDrop(aEnable);
> +                           }),
> +    1000);

lets move this value to a const outside the method with a comment explaining what it is.
Attachment #8904666 - Flags: review?(jmathies) → review+
Pushed by ksteuber@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6aa682399e1e
Move RegisterDragDrop to be called during idle time, if possible r=jimm
https://hg.mozilla.org/mozilla-central/rev/6aa682399e1e
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Whiteboard: [qf:p1][tpi:+][fce-active] → [qf:p1][tpi:+][fce-active-legacy]
Depends on: 1570613
Performance Impact: --- → P1
Whiteboard: [qf:p1][tpi:+][fce-active-legacy] → [tpi:+][fce-active-legacy]
You need to log in before you can comment on or make changes to this bug.