Closed Bug 970123 Opened 10 years ago Closed 10 years ago

PGO ts_paint regression on Win XP (1.5%) and Win 8 (3%) by removing the delayload option from the ICU libs

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox29+ fixed, firefox30+ fixed)

RESOLVED FIXED
mozilla30
Tracking Status
firefox29 + fixed
firefox30 + fixed

People

(Reporter: ttaubert, Assigned: ehsan.akhgari)

References

Details

(Whiteboard: [Australis:P-])

Attachments

(1 file, 1 obsolete file)

Removing the delayload option from the ICU libs regressed ts_paint on Windows.


Here's a pgo try push with today's fx-team:

https://tbpl.mozilla.org/?tree=Try&rev=8c080cc8aa5d

Here is the same cset with bug 958108 backed out:

https://tbpl.mozilla.org/?tree=Try&rev=fe07cbdf0e5f


Comparing Talos numbers shows:

http://perf.snarkfest.net/compare-talos/index.html?oldRevs=8c080cc8aa5d&newRev=fe07cbdf0e5f&submit=true

WINNT 5.1 - 8.65   -1.57%
Windows 8 -23.79   -2.96%
Summary: Bug 970114 - PGO ts_paint regression on Win XP (1.5%) and Win 8 (3%) by removing the delayload option from the ICU libs → PGO ts_paint regression on Win XP (1.5%) and Win 8 (3%) by removing the delayload option from the ICU libs
Ehsan: evidence that unfortunately bug 958108 comment 8 doesn't seem accurate :(

Can we address the crash in bug 958108 some other way?
Or perhaps we can fix the regression without causing bug 958108 by only lazy-loading the ICU data libs?
(In reply to :Gavin Sharp (email gavin@gavinsharp.com) from comment #1)
> Ehsan: evidence that unfortunately bug 958108 comment 8 doesn't seem
> accurate :(

FWIW I did the measurements on an actual Windows laptop with a very slow hard disk.  I would expect those measurements to have caught this... Sigh.

> Can we address the crash in bug 958108 some other way?

Let me try that: <https://tbpl.mozilla.org/?tree=Try&rev=a5848b11811b>
This is a try push which makes the ICU data DLL delay loaded in both mozjs.dll and xul.dll, and is PGOed and based on the same base revision as comment 0.
Benjamin, do you have a link to the webapp in bug 958108 comment 0 handy by any chance?  Or some other way to reproduce bug 958108?
Flags: needinfo?(benjamin)
https://marketplace.firefox.com/app/runuts/ but any app would work I think.
Flags: needinfo?(benjamin)
Blocks: 967146
I and Vladan spent some more time looking into what's happening here.  It seems like we put the names of all DLLs that xul.dll depends on in dependentlibs.list, and we preload them all.  As a result, when we stopped preloading these DLLs in bug 958108, we also implicitly started to preload them all.  This was not obvious to me, and is probably the result of the regression.

FWIW, I think unconditionally preloading every single DLL that xul.dll depends on is a mistake.  It makes no sense for us to preload the ICU data DLL like this...
(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailacopolypse) from comment #6)
> FWIW, I think unconditionally preloading every single DLL that xul.dll
> depends on is a mistake.  It makes no sense for us to preload the ICU data
> DLL like this...

Which is why it should be delay loaded, in which case it's not preloaded. The assumption is that libraries that are not needed at startup time are to be delay loaded, and that will make them not preloaded. Preloading libraries that are going to be loaded, on the other hand, makes cold start faster, which is why it's unconditional. I think ts_paint measures warm startup, so yes, preloading things can make that slower, but the improvements on cold startup far outweigh those possible regressions.
I searched all try logs comment #0 and collected startup durations from the very first ts_paint test run. The subsequent runs are all a tad faster because we actually re-use the profile:


fx-team tip:

706 717 709 724 719 724 698 712 689 671

avg 709.6 / med 714.5

fx-team tip w/ backout:

705 710 704 701 733 709 711 712 712 710

avg 710.7 / med 710


It's only ten values each so there might be some variation in those measurements but it seems like there is no noticeable speedup for cold starts from bug 958108, i.e. preloading everything.
Delay-loading the data DLL seems to make the regression go away: <http://perf.snarkfest.net/compare-talos/index.html?oldRevs=8c080cc8aa5d&newRev=a5848b11811b&submit=true>

I bet money that the reason is that we avoid pre-reading ~10MB from the disk.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> https://marketplace.firefox.com/app/runuts/ but any app would work I think.

Using the try build, I can install and run this app just fine.
Attached patch Patch (v1) (obsolete) — Splinter Review
Attachment #8374042 - Flags: review?(mh+mozilla)
Attachment #8374042 - Flags: review?(gps)
Attachment #8374042 - Flags: review?(benjamin)
Hmm. Comparing your patch to the backout it seems that we're better off than before except for Windows 8. I'm also not sure whether the tpaint regression are real or not.

http://perf.snarkfest.net/compare-talos/index.html?oldRevs=fe07cbdf0e5f&newRev=a5848b11811b&submit=true
(In reply to comment #12)
> Hmm. Comparing your patch to the backout it seems that we're better off than
> before except for Windows 8. I'm also not sure whether the tpaint regression
> are real or not.
> 
> http://perf.snarkfest.net/compare-talos/index.html?oldRevs=fe07cbdf0e5f&newRev=a5848b11811b&submit=true

Yeah, I'm not sure either.  Note that given the fact that the other DLLs are required during startup, I don't think a straight backout makes sense.
Assignee: nobody → ehsan
Component: Webapp Runtime → Build Config
Product: Firefox → Core
Comment on attachment 8374042 [details] [diff] [review]
Patch (v1)

This will certainly delay the problem so that webapprt can start. I'm still a little worried that if a webapp starts to use ICU features that we'll see a delayload crash again.
Attachment #8374042 - Flags: review?(benjamin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #14)
> Comment on attachment 8374042 [details] [diff] [review]
> Patch (v1)
> 
> This will certainly delay the problem so that webapprt can start. I'm still
> a little worried that if a webapp starts to use ICU features that we'll see
> a delayload crash again.

Honestly I never understood the underlying reason behind that crash.  Can you please explain what's going wrong there, and why the same problem doesn't happen for our other DLLs?
Flags: needinfo?(benjamin)
It's a question of how and whether the DLLs are found on the PATH. In particular:

The path of your executable is on the PATH. So when we launch firefox.exe, the ICU libs are in the same directory and so they are in the path. But when we launch a webapp, the executable is in some other directory and so the ICU libs are not on the PATH.

The XPCOM glue solves this by preloading all the DLLs which are dependencies of xul.dll (using dependentlibs.list).

For delayload DLLs in particular, failure to find the DLL in the PATH will result in an exception/crash.
Flags: needinfo?(benjamin)
Why not SetDllDirectory(GrePath)?
That might work. It wasn't available previously because it requires WinXPSP1, but I think we require that anyways now.
(In reply to comment #18)
> That might work. It wasn't available previously because it requires WinXPSP1,
> but I think we require that anyways now.

SetDllDirectory("") is used for other purposes, we should not set it to another path.  Also that won't save you if you try to access the i18n API from a worker while the main thread is here, let's say: <http://mxr.mozilla.org/mozilla-central/source/dom/plugins/base/nsPluginsDirWin.cpp#306>
So we have this code path <http://mxr.mozilla.org/mozilla-central/source/webapprt/win/webapprt.cpp#333> which seems to try to copy webapprt.exe to the GRE dir and then relaunch the app from there.  Couldn't we use that for all app launches instead of only for those where the build id doesn't match?  (I don't claim to understand this code well...)
Flags: needinfo?(benjamin)
(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailacopolypse) from comment #21)
> So we have this code path
> <http://mxr.mozilla.org/mozilla-central/source/webapprt/win/webapprt.
> cpp#333> which seems to try to copy webapprt.exe to the GRE dir and then
> relaunch the app from there.  Couldn't we use that for all app launches
> instead of only for those where the build id doesn't match?  (I don't claim
> to understand this code well...)

The user running the app is likely not to have write permissions on the Firefox directory to do that.

(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailacopolypse) from comment #20)
> (In reply to comment #18)
> > That might work. It wasn't available previously because it requires WinXPSP1,
> > but I think we require that anyways now.
> 
> SetDllDirectory("") is used for other purposes, we should not set it to
> another path.  Also that won't save you if you try to access the i18n API
> from a worker while the main thread is here, let's say:
> <http://mxr.mozilla.org/mozilla-central/source/dom/plugins/base/
> nsPluginsDirWin.cpp#306>

SetDllDirectory("") just removes the current directory from the list of places where a dll is looked for. Setting it to something else won't revert the fact that the current directory is not scanned for dlls. Also, if we set it to the GRE directory, that would be before anything else, so there's no concern about workers.
(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailacopolypse) from comment #21)
> So we have this code path
> <http://mxr.mozilla.org/mozilla-central/source/webapprt/win/webapprt.
> cpp#333> which seems to try to copy webapprt.exe to the GRE dir and then
> relaunch the app from there.  Couldn't we use that for all app launches
> instead of only for those where the build id doesn't match?  (I don't claim
> to understand this code well...)

It's the other way around: the executable in the app directory replaces itself with a copy of the executable in the GRE directory and then relaunches itself from the app directory.  And each app needs its own instance of the executable in order for Windows to treat it as a distinct app (otherwise we could have simply reused the Firefox executable to run the app).
Flags: needinfo?(benjamin)
What if the app were to prepend the appropriate directory to the PATH in its own process?

I just wrote a test program consisting of a main executable and a boilerplate DLL in a separate, off-PATH directory. The main executable calls LoadLibrary twice: first without modifying anything (it fails, of course), second after prepending the directory to the DLL to its own PATH. The dynamic loader picked up the modified path and the second LoadLibrary succeeded.

As long as the path modification is done prior to triggering the delayload, this should work just fine.
Comment on attachment 8374042 [details] [diff] [review]
Patch (v1)

Review of attachment 8374042 [details] [diff] [review]:
-----------------------------------------------------------------

I'm recusing myself from this review since glandium and bsmedberg know far more about this than me.
Attachment #8374042 - Flags: review?(gps)
(In reply to Aaron Klotz [:aklotz] from comment #24)
> As long as the path modification is done prior to triggering the delayload,
> this should work just fine.

That's essentially the same as using SetDllDirectory, except using SetDllDirectory doesn't spill to subprocesses.
Comment on attachment 8374042 [details] [diff] [review]
Patch (v1)

Review of attachment 8374042 [details] [diff] [review]:
-----------------------------------------------------------------

The patch in itself is fine, but please don't land without making sure a webapp that needs that dll properly delay loads it.
Attachment #8374042 - Flags: review?(mh+mozilla) → review+
(In reply to Mike Hommey [:glandium] from comment #22)
> (In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness,
> emailacopolypse) from comment #20)
> > (In reply to comment #18)
> > > That might work. It wasn't available previously because it requires WinXPSP1,
> > > but I think we require that anyways now.
> > 
> > SetDllDirectory("") is used for other purposes, we should not set it to
> > another path.  Also that won't save you if you try to access the i18n API
> > from a worker while the main thread is here, let's say:
> > <http://mxr.mozilla.org/mozilla-central/source/dom/plugins/base/
> > nsPluginsDirWin.cpp#306>
> 
> SetDllDirectory("") just removes the current directory from the list of
> places where a dll is looked for. Setting it to something else won't revert
> the fact that the current directory is not scanned for dlls.

You're right, my mistake.

> Also, if we set
> it to the GRE directory, that would be before anything else, so there's no
> concern about workers.

There is.  That code calls SetDllDirectory(nullptr) to restore the default search order, loads the plugin (which may depend on the default search order), and then calls SetDllDirectory("") to remove the current working directory from the search order again.  If something races with this when trying to load the ICU DLLs (for example, calling an I18n API for the first time or something) then we're doomed.

But there are even more disturbing news.  I looked more closely, and it turns out that the data DLL is actually never linked to from xul.dll or mozjs.dll.  icuuc52.dll is the thing that links against the data DLL, which means that the delay loading patch I posted to try has 0 effect.  And the talos comparison is basically just misleading us.  So much for testing this stuff on try. :(

So we either somehow manage to delay load icuuc52.dll, or we lose.  I'm going to try that locally to see if that DLL is required during startup.
Once we determine that we can delay load icuuc52.dll and it will not be in our startup path, then we should probably use this API to write our own delay load handler: <http://msdn.microsoft.com/en-us/library/f0fkfy9y.aspx> which knows how to load DLLs from the GRE directory, etc.
(Specifically by handling dliNotePreLoadLibrary)
And I just confirmed that all three DLLs are on our startup path.  Not sure where to go from here.  Suggestions welcome.
(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailapocalypse) from comment #31)
> And I just confirmed that all three DLLs are on our startup path.
Even for webapps?
(In reply to David Major [:dmajor] from comment #32)
> (In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness,
> emailapocalypse) from comment #31)
> > And I just confirmed that all three DLLs are on our startup path.
> Even for webapps?

Just tested that to make sure, and yes indeed.  I suspect it is the JS API call to initialize ICU which is loading all of these DLLs.

I don't see any way to fix this without removing that call.
Assigning to Vladan to find an owner, I don't think there is any further action I can take here.
Assignee: ehsan → vdjeric
Actually I have a solution.  The stupid thing that we're doing here which is causing the regression is read-aheading the ICU data DLL.  Removing that line from the dependentlibs.list file fixes the problem, and it means that we don't need to worry about the delay loading business, and webapprt will also works fine.  I'll write a patch for that.
Assignee: vdjeric → ehsan
Why would webapprt work fine if it can't find the dlls because they're not in its path?
(In reply to Mike Hommey [:glandium] from comment #36)
> Why would webapprt work fine if it can't find the dlls because they're not
> in its path?

The DLLs in dependentlibs.lib are loaded with the LOAD_WITH_ALTERED_SEARCH_PATH flag: <http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/standalone/nsXPCOMGlue.cpp#51>  When icudt52.dll is not delay loaded, the OS loader tries to load it as part of the LoadLibraryEx call to load xul.dll (which links against mozjs.dll, which links against icuuc52.dll, which links against icudt52.dll).  LOAD_WITH_ALTERED_SEARCH_PATH will cause the OS loader to look for imported DLLs in the directory of the first argument to LoadLibraryEx, which is where xul.dll is located, which means the OS loader will successfully find icudt52.dll.  And this happens at startup.

The problem we hit in bug 958108 was that with delay loading, the code in delayimp.lib simply uses a LoadLibrary call with a relative path (basically "icudt52.dll") and the OS is not able to find that file anywhere in the search path, so the LoadLibrary() call fails.

The DLL search order is documented here: <http://msdn.microsoft.com/en-us/library/windows/desktop/ms682586%28v=vs.85%29.aspx>
Comment on attachment 8375920 [details] [diff] [review]
Avoid read-aheading icudt52.dll for better startup performance; r=glandium

Tested this patch, it fixes all of the problems we wanted to fix here and doesn't regress the webapprt case.
Attachment #8375920 - Flags: review?(mh+mozilla)
Comment on attachment 8375920 [details] [diff] [review]
Avoid read-aheading icudt52.dll for better startup performance; r=glandium

Asking review from Benjamin too, in the interest of fixing this regression sooner.
Attachment #8375920 - Flags: review?(benjamin)
Whiteboard: [Australis:P-]
Comment on attachment 8375920 [details] [diff] [review]
Avoid read-aheading icudt52.dll for better startup performance; r=glandium

Review of attachment 8375920 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/library/dependentlibs.py
@@ +106,5 @@
> +		# Black list the ICU data DLL because preloading it at startup
> +		# leads to startup performance problems because of its excessive
> +		# size (around 10MB).
> +		if not dep.startswith("icudt"):
> +                    deps.append(dep)

I think it would be better to add syntax to still load the lib but not preload its content. That ensures we won't have any surprise in the future created by the fact that delay loading can't find the lib.
Attachment #8375920 - Flags: review?(mh+mozilla)
Attachment #8375920 - Flags: review?(benjamin)
Attachment #8375920 - Flags: review-
Comment on attachment 8375920 [details] [diff] [review]
Avoid read-aheading icudt52.dll for better startup performance; r=glandium

(In reply to Mike Hommey [:glandium] from comment #41)
> Comment on attachment 8375920 [details] [diff] [review]
> Avoid read-aheading icudt52.dll for better startup performance; r=glandium
> 
> Review of attachment 8375920 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/library/dependentlibs.py
> @@ +106,5 @@
> > +		# Black list the ICU data DLL because preloading it at startup
> > +		# leads to startup performance problems because of its excessive
> > +		# size (around 10MB).
> > +		if not dep.startswith("icudt"):
> > +                    deps.append(dep)
> 
> I think it would be better to add syntax to still load the lib but not
> preload its content. That ensures we won't have any surprise in the future
> created by the fact that delay loading can't find the lib.

We do not delay load this DLL any more, as it's been used during startup and delay loading it doesn't make sense.  I explained in comment 37 why the current loading path is correct.
Attachment #8375920 - Flags: review?(mh+mozilla)
Attachment #8375920 - Flags: review?(benjamin)
Attachment #8375920 - Flags: review-
Comment on attachment 8375920 [details] [diff] [review]
Avoid read-aheading icudt52.dll for better startup performance; r=glandium

Review of attachment 8375920 [details] [diff] [review]:
-----------------------------------------------------------------

Meh
Attachment #8375920 - Flags: review?(mh+mozilla)
Attachment #8375920 - Flags: review?(benjamin)
Attachment #8375920 - Flags: review+
Comment on attachment 8375920 [details] [diff] [review]
Avoid read-aheading icudt52.dll for better startup performance; r=glandium

Review of attachment 8375920 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/library/dependentlibs.py
@@ +106,5 @@
> +		# Black list the ICU data DLL because preloading it at startup
> +		# leads to startup performance problems because of its excessive
> +		# size (around 10MB).
> +		if not dep.startswith("icudt"):
> +                    deps.append(dep)

Why the tabs?
https://hg.mozilla.org/mozilla-central/rev/0050e64d10c8
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
(In reply to :Ms2ger from comment #45)
> Comment on attachment 8375920 [details] [diff] [review]
> Avoid read-aheading icudt52.dll for better startup performance; r=glandium
> 
> Review of attachment 8375920 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: toolkit/library/dependentlibs.py
> @@ +106,5 @@
> > +		# Black list the ICU data DLL because preloading it at startup
> > +		# leads to startup performance problems because of its excessive
> > +		# size (around 10MB).
> > +		if not dep.startswith("icudt"):
> > +                    deps.append(dep)
> 
> Why the tabs?

Because Windows.  https://hg.mozilla.org/integration/mozilla-inbound/rev/88695c24eaf4
Tim, can you please verify that the regression is fixed?  I have no luck trying to use graphs.mozilla.org!
Flags: needinfo?(ttaubert)
Yup, I'm on it. Waiting for try runs to complete.
Look what I found!

Improvement: Mozilla-Inbound-Non-PGO - Tp5 No Network Row Major MozAfterPaint (Main Startup File IO Bytes) - WINNT 6.1 (ix) - 22.7% decrease
--------------------------------------------------------------------------------------------------------------------------------------------
    Previous: avg 47099883.333 stddev 2343.204 of 12 runs up to revision 4bc79e20fae0
    New     : avg 36398483.333 stddev 64549.147 of 12 runs since revision 0050e64d10c8
    Change  : -10701400.000 (22.7% / z=4566.994)
    Graph   : http://mzl.la/1nK2v2C

Changeset range: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=4bc79e20fae0&tochange=0050e64d10c8

Changesets:
  * http://hg.mozilla.org/integration/mozilla-inbound/rev/0050e64d10c8
    : Ehsan Akhgari <ehsan@mozilla.com> - Bug 970123 - Avoid read-aheading icudt52.dll for better startup performance; r=glandium
    : http://bugzilla.mozilla.org/show_bug.cgi?id=970123

Bugs:
  * http://bugzilla.mozilla.org/show_bug.cgi?id=970123 - PGO ts_paint regression on Win XP (1.5%) and Win 8 (3%) by removing the delayload option from the ICU libs
Comparing the backout in comment #0 and the same parent changeset with the new patch on top I get the following:

http://i.imgur.com/wRoPVoj.png

Windows XP and Windows 7 show slight improvements for ts_paint while Windows 8 still has a 2.5% regression. Every revision had 20-25 talos runs.

(Sorry for the screenshot but try has been reset this weekend.)
Flags: needinfo?(ttaubert)
(In reply to Tim Taubert [:ttaubert] from comment #53)
> Comparing the backout in comment #0 and the same parent changeset with the
> new patch on top I get the following:
> 
> http://i.imgur.com/wRoPVoj.png
> 
> Windows XP and Windows 7 show slight improvements for ts_paint while Windows
> 8 still has a 2.5% regression. Every revision had 20-25 talos runs.
> 
> (Sorry for the screenshot but try has been reset this weekend.)

Any chance you could try this patch out please? <https://gist.github.com/ehsan/9190407>
Benjamin, as I understand correctly, ICU is disabled for desktop in Firefox 29, right?  If that is true, do we really need to backport this to Aurora?
Flags: needinfo?(benjamin)
I don't know about 29: ICU was disabled on *beta* for 28, but I thought we were trying to ship it in 29 or 30, and it's probably on for FF29 on Aurora.
Flags: needinfo?(benjamin)
Blocks: 915735
Comment on attachment 8375920 [details] [diff] [review]
Avoid read-aheading icudt52.dll for better startup performance; r=glandium

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 915735
User impact if declined: startup regression
Testing completed (on m-c, etc.): locally/m-c
Risk to taking this patch (and alternatives if risky): zero
String or IDL/UUID changes made by this patch: none
Attachment #8375920 - Flags: approval-mozilla-aurora?
Attachment #8374042 - Attachment is obsolete: true
Attachment #8375920 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailapocalypse) from comment #54)
> (In reply to Tim Taubert [:ttaubert] from comment #53)
> > Windows XP and Windows 7 show slight improvements for ts_paint while Windows
> > 8 still has a 2.5% regression. Every revision had 20-25 talos runs.
> > 
> > (Sorry for the screenshot but try has been reset this weekend.)
> 
> Any chance you could try this patch out please?
> <https://gist.github.com/ehsan/9190407>

This looks better, I think:

http://perf.snarkfest.net/compare-talos/index.html?oldRevs=4ee7a06b550a&newRev=73188edcc28d&submit=true

The 0.26% WinXP regression is probably negligible and may be a talos glitch. I compared the cset right before bug 958108 and the cset from bug 958108 with all of the patches in this bug and your gist on top.
Depends on: 978509
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: