Closed Bug 385224 Opened 17 years ago Closed 17 years ago

Performance: Site-specific prefs regressed Txul and Ts

Categories

(Toolkit :: Preferences, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9beta2

People

(Reporter: sayrer, Assigned: myk)

References

Details

(Keywords: perf, regression, Whiteboard: [has patch])

Attachments

(7 files, 2 obsolete files)

Looks like this regressed perf once all the manifests were corrected.

Ts/Txul on linux regressed, Ts on mac regressed, and Ts/Txul on windows regressed.
Flags: blocking1.9?
Status: NEW → ASSIGNED
Keywords: regression
OS: Linux → All
Hardware: PC → All
Target Milestone: --- → mozilla1.9alpha6
Version: unspecified → Trunk
The content prefs code adds three JS files to the build.

The first, nsContentPrefService.js, implements two XPCOM services (nsIContentPrefService and nsIContentURIGrouper) that have to be registered on startup (but presumably only on the first startup, so they shouldn't affect subsequent startup times).

The other two, browser-contentPrefSink.js and browser-textZoom.js, implement JS singletons (ContentPrefSink and TextZoom) that exist for each browser window context.  The files get included in browser.js via preprocessor directives and thus get parsed on startup:

#include browser-contentPrefSink.js
#include browser-textZoom.js

When the browser starts up, delayedStartup calls ContentPrefSink.init, which adds web progress and DOMContentLoaded listeners to the tab browser.  Then it calls TextZoom.init, which does several things that entrain other things:

1. adds a DOMMouseScroll event listener to the window;

2. registers itself with the content pref service as an observer;
  * this instantiates the service, initializing the database connection;

3. registers itself with the content pref sink as an observer;
  * this queries the service for the global value of the preference,
    which includes creating a database statement and executing it;

4. applies the global value of the preference to the zoom manager;
  * this ultimately sets nsIMarkupDocumentViewer::textZoom.

Then, when the initial page starts to load, ContentPrefSink receives an onLocationChange progress notification for it, which it passes to TextZoom, which retrieves the value of the text zoom pref (if any) from the service and applies it to the zoom manager.

All of this happens before the first page loads, so the time it takes to do these things will affect Ts.  Some of these things also happen each time a new browser window opens and thus will affect Txul.

Questions:

* What steps in this process might not be strictly necessary on startup?
* What could we delay until after initial startup?
* What could be made more efficient?
Target Milestone: mozilla1.9alpha6 → mozilla1.9beta1
we'll just take a look at this for a few runs
Attachment #270524 - Flags: review? → review?(gavin.sharp)
Attachment #270524 - Flags: review?(gavin.sharp) → review+
Looks like this definitely impacts windows Ts and Txul. The resolution of Windows timers makes it hard to estimate exactly, but it seems bigger than 5%.
As usual I may not have followed this correctly! Is this comment right ( http://lxr.mozilla.org/seamonkey/source/browser/base/content/browser-textZoom.js#253 )?

253    * Per DocumentViewerImpl::SetTextZoom in nsDocumentViewer.cpp, it looks
254    * like we can set the zoom to its current value without significant impact
255    * on performance, as the setting is only applied if it differs from the
256    * current setting.

The comment in DocumentViewerImpl::SetTextZoom and the code seems to imply that it is not ( http://lxr.mozilla.org/seamonkey/source/layout/base/nsDocumentViewer.cpp#2544 )

2544   // Set the text zoom on all children of mContainer (even if our zoom didn't
2545   // change, our children's zoom may be different, though it would be unusual).
2546   // Do this first, in case kids are auto-sizing and post reflow commands on
2547   // our presshell (which should be subsumed into our own style change reflow).
2548   struct TextZoomInfo textZoomInfo = { aTextZoom };
2549   CallChildren(SetChildTextZoom, &textZoomInfo);
2550 
2551   // Now change our own zoom
2552   if (mPresContext && aTextZoom != mPresContext->TextZoom()) {
2553       mPresContext->SetTextZoom(aTextZoom);

In which case, in _applyPrefToSetting, we would need a check against current value before setting this._zoomManager.textZoom
(In reply to comment #6)
> As usual I may not have followed this correctly! Is this comment right (
> http://lxr.mozilla.org/seamonkey/source/browser/base/content/browser-textZoom.js#253
> )?

Yes, the comment is correct.


> 253    * Per DocumentViewerImpl::SetTextZoom in nsDocumentViewer.cpp, it looks
> 254    * like we can set the zoom to its current value without significant
> impact
> 255    * on performance, as the setting is only applied if it differs from the
> 256    * current setting.
> 
> The comment in DocumentViewerImpl::SetTextZoom and the code seems to imply 
> that it is not (
> http://lxr.mozilla.org/seamonkey/source/layout/base/nsDocumentViewer.cpp#2544

The comment in DocumentViewerImpl::SetTextZoom points out that SetTextZoom will recursively iterate over child documents, but, per line 2552 of that file <http://lxr.mozilla.org/seamonkey/source/layout/base/nsDocumentViewer.cpp#2552> it will still only set the zoom for the document and its children (and thus prompt a reflow and a significant performance hit) if the new value differs from the old.


> In which case, in _applyPrefToSetting, we would need a check against current
> value before setting this._zoomManager.textZoom

We could do this, and it would probably improve performance , but the improvement would come at the cost of some correctness, if DocumentViewerImpl is correct in suggesting that child documents sometimes have a different zoom value than their parents, and I'm correct in thinking that users want child documents to reflect their text zoom prefs.

Nevertheless, it may be worth it, especially if out-of-sync child documents are rare.  I'll put together a patch that does this.

Note: we might get even better performance if we cache the current value of the setting between location changes so we don't have to cross the XPCOM boundary when checking the site's zoom preference against the current value.  But then we'd also have to listen for tab change events to differentiate them from other kinds of regular location changes, since a tab change invalidates the cache.

Note: I'm working on setting up an environment that will let me run the Ts and Txul performance tests on local builds so I can measure the impact of various changes.  Unfortunately, doing so is a bit of a challenge, particularly finding a way to get consistent enough results.
Does doing the database query involve hitting the disk for anything large?

It might be worth profiling startup and seeing what code from this stuff shows up instead of trying to get local Tx/Txul to give consistent numbers, especially since they're pretty noisy even on tinderbox.
Keywords: perf
(In reply to comment #8)
> Does doing the database query involve hitting the disk for anything large?

Not really, as text zoom prefs are ints, as are record IDs (the other thing we sometimes return from queries), and query conditions contain nothing larger than hostnames.  Also, we're noticing this slowdown on startup with an empty database from which no data gets returned by the queries.


> It might be worth profiling startup and seeing what code from this stuff 
> shows up instead of trying to get local Tx/Txul to give consistent numbers,
> especially since they're pretty noisy even on tinderbox.

Good idea.  Here's a profile from a VM on my laptop (not sure how noisy that makes the numbers) where I started Firefox up and then shut it down manually with the JS profiler on.  I post-processed it with the following command:

grep -C 2 ContentPrefService ~/jsprofile.log > ~/jsprofile-contentprefs.log \
&& grep HostnameGrouper ~/jsprofile.log >> ~/jsprofile-contentprefs.log \
&& grep ContentPrefSink ~/jsprofile.log >> ~/jsprofile-contentprefs.log \
&& grep TextZoom ~/jsprofile.log >> ~/jsprofile-contentprefs.log \
&& grep -A 14 viewZoomOverlay ~/jsprofile.log >> ~/jsprofile-contentprefs.log

TextZoom_init shows up as taking the most amount of time, none of which is spent inside that function, which is expected, as it entrains all the work of instantiating the service and initializing the database.  Other notable expenditures (where notable is defined as averaging 3ms or more) are:

[1,1] ContentPrefService__dbInit() {669-720} 2883 {min 3, max 3, avg 3, sum 3, self 4287496095}
[1,1] ContentPrefService() {43-44} 328 {min 3, max 3, avg 3, sum 3, self 2341572797}
[1,1] ContentPrefService__init() {82-90} 546 {min 3, max 3, avg 3, sum 3, self 2425985678}
[1,1] <top level>() {0-899} 14833 {min 3, max 3, avg 3, sum 3, self 2341572799}
[1,1] ContentPrefSink_onLocationChange() {5553-5560} 834 {min 5, max 5, avg 5, sum 5, self 4294967182}
[1,1] ContentPrefSink__handleLocationChanged() {5577-5602} 1646 {min 5, max 5, avg 5, sum 5, self 2324930190}
[1,1] TextZoom_init() {5794-5806} 988 {min 18, max 18, avg 18, sum 18, self 0}

(Not sure what <top level> is, perhaps the time it takes to import XPCOMUtils?)

Unfortunately the "self" numbers are horked for most of these, making the data more opaque, but onLocationChange just calls handleLocationChange, so both those 5s represent the same expenditure, and the service's constructor just calls its init method, which mostly just calls dbInit, so those numbers are likely to be a single expenditure, leaving us with:

[1,1] <top level>() {0-899} 14833 {min 3, max 3, avg 3, sum 3, self 2341572799}
[1,1] ContentPrefService__dbInit() {669-720} 2883 {min 3, max 3, avg 3, sum 3, self 4287496095}
[1,1] ContentPrefSink__handleLocationChanged() {5577-5602} 1646 {min 5, max 5, avg 5, sum 5, self 2324930190}

But TextZoom_init doesn't trigger handleLocationChanged, which means only 6ms of the 18ms spent in TextZoom_init is accounted for by service and database instantiation/initialization.  Guess I should also look at the 2s and 1s.

This 2 in particular jumps out at me:

[1,2] ContentPrefService_getPrefs() {210-216} 690 {min 0, max 4, avg 2, sum 4, self 2425985679}

getPrefs gets called once on startup when the location changes for the first tab and once again on DOMContentLoaded for that tab.  Probably the first time it cost 4ms to run the query and return the results, and the second time the results were pulled from cache.  Perhaps this query can be optimized so it runs more quickly the first time as well.

The other 2s are TextZoom_destroy and ContentPrefSink_addObserver, but the former is a shutdown cost.  And then there are a bunch of 1s it might be worth looking at.  Unfortunately there's no smoking gun, so it's going to be a slog.

Note: I looked at a few of the "anonymous" functions in nsContentPrefService.js, and they're mostly getters, although one is an anonymous function (_dbCreateStatement).

None are problematic, however.  Also, my grepping is undoubtedly missing some anonymous getters in ContentPrefSink and TextZoom, but those should be similarly unproblematic.
Today I spent some time trying to get consistent results for Ts on my local machine, but it was again futile.  I tried enabling the VMI extensions in VMWare, which enabled me to disable time synchronization in VMWare tools.  I also tried putting the profile directory on a RAM disk.  And I tried using a lighter weight window manager (xcfe).  But the numbers always varied too widely to be useful.

I wonder if putting the bin directory on a RAM disk would help.  Or perhaps running the Perl-based tinderbox Ts tests, which use Time::HiRes, instead of the Python-based Talos Ts tests, which are perhaps not making high-resolution measurements.

In any case, I then ran some more profiles, this time with an optimized build.  The profiles show significant variance as well, so I can't provide exact numbers, but here are some sample profiles.

Note: To get these results, I added names to all relevant anonymous functions, including getters, then I post-processed the output to include only functions that had a cost.  I also allowed script to close windows (dom.allow_scripts_to_close_windows), set new pages to open in new windows (browser.link.open_external = 2 and browser.link.open_newwindow = 2), and permitted file: pages to open popup windows.


Command that generates a Ts profile:

env MOZILLA_JS_PROFILER_OUTPUT=~/jsprofile.log dist/bin/firefox -profile /tmp/jsprofile http://lxr.mozilla.org/mozilla/source/testing/performance/talos/startup_test/startup_test.html && grep -E '(ContentPrefService|HostnameGrouper|ContentPrefSink|TextZoom|ZoomManager).*sum [1-9]' ~/jsprofile.log > ~/jsprofile-ts.log
 
Sample Ts profile:

    [1,1] ZoomManager_get_markupDocumentViewer() {149-150} 371 {min 1, max 1, avg 1, sum 1, self 0}
    [1,1] ZoomManager_set_textZoom() {46-50} 716 {min 1, max 1, avg 1, sum 1, self 4288610195}
    [1,1] ContentPrefService() {43-44} 304 {min 1, max 1, avg 1, sum 1, self 2513541789}
    [1,1] ContentPrefService__init() {82-90} 498 {min 1, max 1, avg 1, sum 1, self 4287692686}
    [1,1] TextZoom__applyPrefToSetting() {6142-6153} 911 {min 1, max 1, avg 1, sum 1, self 4288348060}
    [1,1] TextZoom_destroy() {6006-6016} 958 {min 1, max 1, avg 1, sum 1, self 4289789853}
    [1,2] TextZoom__cps() {5956-5960} 546 {min 0, max 4, avg 2, sum 4, self 4287692714}
    [1,1] TextZoom_init() {5991-6003} 900 {min 5, max 5, avg 5, sum 5, self 0}

This profile shows two Ts costs.  The first, which accounts for 4ms of the 5ms spent in TextZoom_init, is to instantiate the content pref service (TextZoom_cps, ContentPrefService, ContentPrefService__init), of which 1ms is spent on service initialization and 3ms is not accounted for (perhaps XPCOM overhead?).

The second, which accounts for the last 1ms spent in TextZoom_init, is to apply the global pref to the setting (TextZoom__applyPrefToSetting, ZoomManager_get_markupDocumentViewer, ZoomManager_set_textZoom).

In other runs these costs were sometimes greater, sometimes lesser (TextZoom_init usually landed between 3ms and 7ms), and sometimes I saw 1ms each in ContentPrefSink_handleEvent and ContentPrefSink_onLocationChange).

To reduce these costs, we can stop applying the global pref to the setting on startup, since that isn't necessary (it'll just get reapplied the moment we start to load a URL).  And we can try only applying the preference if it differs from the current value of the setting, at an unknown but probably slight cost in correctness, per comment 7.


Command that generates a Txul profile:

env MOZILLA_JS_PROFILER_OUTPUT=~/jsprofile.log dist/bin/firefox -profile /tmp/jsprofile file:///home/myk/Mozilla/source/mozilla/xpfe/test/winopen.xul && grep -E '(ContentPrefService|HostnameGrouper|TextZoom|ZoomManager).*sum [1-9]' ~/jsprofile.log > ~/jsprofile-txul.log

Sample Txul profile:

    [1,21] ZoomManager_get_markupDocumentViewer() {149-150} 371 {min 0, max 1, avg 0, sum 2, self 4288610206}
    [1,21] ZoomManager_set_textZoom() {46-50} 716 {min 0, max 1, avg 0, sum 3, self 4288348052}
    [1,11] ZoomManager_getInstance() {16-20} 410 {min 0, max 1, avg 0, sum 1, self 1}
    [1,1] ContentPrefService_observe() {111-117} 523 {min 1, max 1, avg 1, sum 1, self 4294967196}
    [1,33] ContentPrefService__stmtSelectGlobalPref() {315-325} 775 {min 0, max 1, avg 0, sum 1, self 2240852689}
    [1,1] ContentPrefService__dbInit() {669-720} 2627 {min 1, max 1, avg 1, sum 1, self 2660471704}
    [1,11] ContentPrefService__selectGlobalPref() {328-341} 818 {min 0, max 1, avg 0, sum 1, self 4294963020}
    [1,1] ContentPrefService() {43-44} 304 {min 1, max 1, avg 1, sum 1, self 2513541789}
    [1,21] ContentPrefService_grouper() {271-275} 648 {min 0, max 1, avg 0, sum 1, self 4287430543}
    [1,21] ContentPrefService__selectPrefs() {579-594} 1220 {min 0, max 1, avg 0, sum 4, self 2633735427}
    [1,2] ContentPrefService__dbCreateStatement() {647-661} 1535 {min 0, max 1, avg 0, sum 1, self 2526793869}
    [1,1] ContentPrefService__init() {82-90} 498 {min 1, max 1, avg 1, sum 1, self 4287692685}
    [1,1] ContentPrefService__destroy() {93-103} 816 {min 1, max 1, avg 1, sum 1, self 4290183067}
    [1,21] HostnameGrouper_group() {855-890} 1266 {min 0, max 1, avg 0, sum 1, self 2063597439}
    [1,11] ContentPrefService_getPref() {125-131} 643 {min 0, max 1, avg 0, sum 1, self 2526793868}
    [1,21] ContentPrefService_getPrefs() {210-216} 642 {min 0, max 1, avg 0, sum 7, self 2513541791}
    [1,32] ContentPrefSink__cps() {5698-5702} 560 {min 0, max 1, avg 0, sum 1, self 1}
    [1,165] TextZoom_some() {5982-5982} 396 {min 0, max 1, avg 0, sum 1, self 4287807463}
    [1,21] TextZoom__applyPrefToSetting() {6142-6153} 911 {min 0, max 1, avg 0, sum 4, self 0}
    [1,11] ContentPrefSink__handleDOMContentLoaded() {5804-5835} 1789 {min 0, max 1, avg 0, sum 7, self 2}
    [1,21] TextZoom__zoomManager() {5948-5951} 439 {min 0, max 1, avg 0, sum 1, self 0}
    [1,11] ContentPrefSink_addObserver() {5867-5880} 780 {min 0, max 1, avg 0, sum 2, self 1}
    [1,10] ContentPrefSink_onLocationChange() {5750-5757} 794 {min 0, max 2, avg 0, sum 7, self 0}
    [1,11] TextZoom_destroy() {6006-6016} 958 {min 0, max 1, avg 0, sum 3, self 3305110971}
    [1,11] ContentPrefSink_destroy() {5729-5740} 847 {min 0, max 1, avg 0, sum 4, self 4}
    [1,44] ContentPrefSink_some() {5714-5714} 408 {min 0, max 1, avg 0, sum 2, self 2}
    [1,11] ContentPrefSink_init() {5723-5725} 629 {min 0, max 1, avg 0, sum 1, self 1}
    [1,55] TextZoom_QueryInterface() {5981-5984} 704 {min 0, max 1, avg 0, sum 3, self 2143289255}
    [1,22] ContentPrefSink_QueryInterface() {5713-5716} 718 {min 0, max 1, avg 0, sum 2, self 0}
    [1,22] TextZoom__cps() {5956-5960} 546 {min 0, max 3, avg 0, sum 5, self 5}
    [1,11] TextZoom_init() {5991-6003} 900 {min 1, max 4, avg 1, sum 14, self 3556755110}
    [1,11] ContentPrefSink_handleEvent() {5769-5771} 424 {min 0, max 1, avg 0, sum 7, self 0}
    [1,10] ContentPrefSink__handleLocationChanged() {5774-5799} 1550 {min 0, max 2, avg 0, sum 7, self 6}
    [1,10] TextZoom_onLocationChanged() {6098-6099} 413 {min 0, max 1, avg 0, sum 1, self 1}

This profile is more complicated, and there are additional costs besides the ones already identified by the Ts test:

In particular, ContentPrefSink_handleEvent and ContentPrefSink_onLocationChange rack up 7ms each, while ContentPrefSink_destroy and TextZoom_destroy spend 4ms and 3ms, respectively.  ContentPrefSink_QueryInterface and TextZoom_QueryInterface also make appearances at 2ms and 3ms, respectively.

To reduce these costs, we could destroy less in the destroy functions; currently we destroy properties wantonly, and now that we've resolved that vexing XPCOM leak (bug 180380), we don't have to do that anymore.  And we could simplify the logic in the frequently-called QueryInterface functions, which use expensive Array::some calls.

If those optimizations don't recover sufficient performance, three more drastic measures would be to make the ContentPrefSink stop handling DOMContentLoaded (which isn't used by text zoom, although it's useful for extensions), remove the content pref sink entirely (making the text zoom controller directly responsible for handling location changes), and move the controller code into native code, putting it somewhere in the load process between identification of the URL to load and initial reflow of the content loaded from that URL.

Here's a patch that implements the less drastic optimizations listed above.  In addition to those changes, I've removed the logging code from the content pref service, which showed up as a small cost occasionally.


With the patch, Ts and Txul profiles show some improvement...

Sample Ts profile:

    [1,1] ContentPrefService__dbInit() {669-713} 2058 {min 1, max 1, avg 1, sum 1, self 2526793869}
    [1,1] ContentPrefService() {43-44} 304 {min 1, max 1, avg 1, sum 1, self 2513541789}
    [1,1] ContentPrefService__init() {82-90} 498 {min 1, max 1, avg 1, sum 1, self 2526793868}
    [1,1] TextZoom_init() {5985-5993} 580 {min 4, max 4, avg 4, sum 4, self 2425200081}
    [1,2] TextZoom_get__cps() {5952-5956} 554 {min 0, max 3, avg 1, sum 3, self 4287692713}
    [1,1] ContentPrefSink_addObserver() {5863-5876} 780 {min 1, max 1, avg 1, sum 1, self 4287496092}

Sample Txul profile:

    [1,10] ZoomManager_get_textZoom() {30-43} 968 {min 0, max 1, avg 0, sum 3, self 4288610194}
    [1,10] ZoomManager_get_markupDocumentViewer() {149-150} 371 {min 0, max 1, avg 0, sum 3, self 4287692678}
    [1,21] ContentPrefService__selectPrefs() {579-594} 1220 {min 0, max 1, avg 0, sum 2, self 4294967294}
    [1,21] ContentPrefService_getPrefs() {210-216} 642 {min 0, max 1, avg 0, sum 2, self 2341572797}
    [1,32] ContentPrefSink_get__cps() {5698-5702} 568 {min 0, max 1, avg 0, sum 1, self 4288348078}
    [1,11] ContentPrefSink_destroy() {5727-5737} 736 {min 0, max 1, avg 0, sum 2, self 4286971794}
    [1,11] TextZoom_init() {5985-5993} 580 {min 0, max 4, avg 1, sum 11, self 2425200086}
    [1,22] ContentPrefSink_QueryInterface() {5709-5714} 684 {min 0, max 1, avg 0, sum 1, self 1}
    [1,11] ContentPrefSink_handleEvent() {5765-5767} 424 {min 0, max 1, avg 0, sum 6, self 4288741269}
    [1,10] ContentPrefSink__handleLocationChanged() {5770-5795} 1550 {min 0, max 2, avg 0, sum 8, self 2}
    [1,11] ContentPrefSink_init() {5721-5723} 629 {min 0, max 1, avg 0, sum 1, self 4290510776}
    [1,10] ContentPrefSink_onLocationChange() {5746-5753} 794 {min 0, max 2, avg 0, sum 9, self 4290576275}
    [1,55] TextZoom_QueryInterface() {5972-5978} 765 {min 0, max 1, avg 0, sum 1, self 4287627218}
    [1,22] TextZoom_get__cps() {5952-5956} 554 {min 0, max 4, avg 0, sum 5, self 4287692715}
    [1,11] ContentPrefSink__handleDOMContentLoaded() {5800-5831} 1789 {min 0, max 1, avg 0, sum 6, self 4287496094}
    [1,11] ContentPrefSink_addObserver() {5863-5876} 780 {min 0, max 1, avg 0, sum 1, self 4287496091}
    [1,10] TextZoom__applyPrefToSetting() {6130-6155} 1028 {min 0, max 1, avg 0, sum 3, self 4294967285}
    [1,11] TextZoom_destroy() {5996-6005} 761 {min 0, max 1, avg 0, sum 2, self 2}
    [1,10] TextZoom_onLocationChanged() {6086-6087} 413 {min 0, max 1, avg 0, sum 3, self 4287365004}
Attachment #277278 - Flags: review?(sayrer)
(In reply to comment #10)

> This profile shows two Ts costs.  The first, which accounts for 4ms of the 5ms
> spent in TextZoom_init, is to instantiate the content pref service
> (TextZoom_cps, ContentPrefService, ContentPrefService__init), of which 1ms is
> spent on service initialization and 3ms is not accounted for (perhaps XPCOM
> overhead?).

XPCOM overhead seems unlikely, but unless your Linux has some sub-ms-period high res clock option, we may be trying to reason with too few bits. Do you have a machine-level profile as well as a JS profile?

/be
(In reply to comment #11)
> XPCOM overhead seems unlikely, but unless your Linux has some sub-ms-period
> high res clock option, we may be trying to reason with too few bits. Do you
> have a machine-level profile as well as a JS profile?

No, I haven't done a machine-level profile yet.  Of the tools listed on http://wiki.mozilla.org/Performance:Tools#Profiling , which is recommended for that?
Comment on attachment 277278 [details] [diff] [review]
patch v1: improves efficiency in several ways

Canceling review. Brendan is right--need better profiling here. DTrace is what's needed, imho.

I'll attach a DTrace JS profile showing results of hitting reload on 

http://dev.jquery.com/~john/speed/114.html

Looks like the ContentPrefsService and SessionStore are the principle offenders in chrome JS for Tp. We can do a similar check on Ts/Txul.
Attachment #277278 - Flags: review?(sayrer)
er, that's principal offender!
Robert's previous output showed the culprit were functions called "anonymous" - a result which isn't as helpful as it could be (my fault for not thinking of anonymous functions earlier). After getting the same result myself on some applications, I added the following probe to version 2 of the provider (which is the current version on Bugzilla),

javascript*:::function-info (filename, classname, funcname, lineno, runfilename, runlineno)

which is an optional substitute for function-entry (although might be better suited at function-return, when such arguments are saved for printing out).

The function-info probe identifies the filename and line number where the anonymous function was declared, as well as the filename and line number for where it was executed. I just quickly hacked it into js_functime and posted the result - does that help identify what those anonymous functions are?
(In reply to comment #16)
> Robert's previous output showed the culprit were functions called "anonymous"

This method of metering JS is useful, but not as a way of identifying culprits for this particular bug. For this bug, we expect the jQuery stuff to be expensive--it's benchmark code. The fact that SSS and ContentPrefs code is showing up on that scale is the problem. I happened to notice it while looking at jQuery.
Ahh ok, I do see functions such as sss_saveState (nsSessionStore.js) taking over 50ms on your run, and over 80 ms on mine (and sometimes higher).

Why sss_saveState() is slow may already be obvious to many on this list; nevertheless, I did a bit more digging to see how DTrace can explore this.

I started by tracing the JavaScript function sss_saveState(), and all the system calls that it causes, along with delta times. There were many read()s and write()s, and a several filesystem syscalls. The time seemed to be mostly spent in syscalls such as xstat(), open64(), close() and rename(), rather than in JavaScript.

Digging further, I traced sss_saveState() along with those four syscall types, and used DTrace to print out their arguments. I've attached the full output, which shows these syscalls in relation to the JavaScript code, and delta times. Below is a summary from a different run, with just the syscalls,

# ./js_truss -p `pgrep -n firefox-bin`
C   DELTA FILE                 TYPE      -- FUNC
0       3 nsSessionStore.js    JS         -> sss_saveState
0   17374 "                    syscall      -> xstat (sessionstore.js)
0     711 "                    syscall      <- xstat (0 errno=0)
0      63 "                    syscall      -> open64 (sessionstore.js 0x701)
0     998 "                    syscall      <- open64 (-1 errno=17)
0      40 "                    syscall      -> open64 (sessionstore-1.js 0x701)
0   53059 "                    syscall      <- open64 (41 errno=0)
0      30 "                    syscall      -> close (41)
0     406 "                    syscall      <- close
0      20 "                    syscall      -> open64 (sessionstore-1.js 0x301)
0   22644 "                    syscall      <- open64 (41 errno=0)
0     431 "                    syscall      -> close (41)
0   70284 "                    syscall      <- close
0      54 "                    syscall      -> rename (sessionstore-1.js, sessionstore.js)
0    9505 "                    syscall      <- rename
0      70 nsSessionStore.js    JS         <- sss_saveState
^C

To start with, xstat() checks for "sessionstore.js" and finds it. Then open64() attempts to open it with the flags O_WRONLY|O_TRUNC|O_CREAT|O_EXCL (0x701) which errors with code 17 "File exists". Seems an odd to use O_EXCL next, since xstat() had already found this file to exist.

Then, "sessionstore-1.js" is opened once with O_EXCL and then again without. Why open it twice? Finally, "sessionstore-1.js" is renamed to "sessionstore.js".

Sounds like a few slow syscalls might be dropped with a little different logic.

The syscall times did vary from run to run, I'd guess because my home directory is on NFS and these syscalls may be causing network I/O - more DTrace can confirm that. The syscall delta times are also a little higher than they should be, as by this point I'm using DTrace to pull in syscall arguments and print them out.

Anyhow, that was just for sss_saveState. It might be interesting to attack other functions in a similar way.
Flags: blocking1.9? → blocking1.9+
Target Milestone: mozilla1.9 M8 → mozilla1.9 M10
Priority: -- → P1
Ping - we still working on this? We have a bunch to makeup r.e. Txul/Ts?
(In reply to comment #19)
> Ping - we still working on this? We have a bunch to makeup r.e. Txul/Ts?

Yes, I've now stood up dtrace on Leopard and am using it to find the regression.
I've done some profiling with DTrace on Mac OS 10.5, and here's an updated patch that seems to improve Tp, Ts, and Txul performance on my machine, although it's not clear by how much.

The patch includes the following changes:

1. gave names to all relevant functions to enable profiling (should these be stripped before checkin?);
2. removed ContentPrefSink, which was a premature optimization for both core code and extensions that isn't strictly necessary and currently costs more than it gains;
3. fixed a bug that caused FullZoom.globalValue to always be set (not really a performance issue, but was useful when testing other performance changes);
4. when setting the zoom to the default value (the common case, especially in tests with fresh profiles), made FullZoom._applyPrefToSetting set ZoomManager.fullZoom directly instead of indirectly through ZoomManager.reset;
5. cached the frequently-accessed docshell in the browser widget, which should work as long as the docshell doesn't change for the life of the widget, which seems to be the case, as far as I can tell from various docshell docs (this could improve performance for other code as well);
6. updated the prescontext to do as little as possible if we aren't changing the zoom (this doesn't seem to have had much effect, but it seems like the right thing to do anyway, and maybe it makes a difference in some cases not caught by the tests);
7. stopped setting the zoom on startup, since it'll just be set again when we start to load the first page;
8. made FullZoom lazily load the global value;
9. removed logging from the content pref service (only one of the logging statements was responsible for a performance regression on tests, but the others don't seem useful enough to warrant keeping the logging infrastructure);
10. removed overzealous destruction, which is unnecessary now that we understand and have fixed the persistent XPCOM destruction leaks.

Besides these issues, I noted that XPCOM and JS parsing overhead seem to be significant factors in the cost of the code, since FullZoom was significantly cheaper when it didn't instantiate the content pref service and well as when it instantiated a version of the service that had all of its code (except the XPCOM glue) stripped out of it.

The initialization of the content pref service, which consists of opening a connection to a mozstorage database and checking the schema version, was pretty cheap relative to the XPCOM and JS parsing costs incurred in the process of instantiating it.  I tried inlining the XPCOM glue in case the XPCOMUtils.jsm module was responsible for some overhead, but that didn't seem to make much of a difference.

There might be more things we could do to improve apparent performance on Ts and Txul tests, like having FullZoom register as a pref listener in a timeout, but I don't think it would significantly affect actual performance, given that the major cost in registering as a listener is instantiating the service, and that'll happen on the first location change anyway, which I expect happens before Ts/Txul report that the application/window has loaded.
Attachment #277278 - Attachment is obsolete: true
Attachment #288089 - Attachment is obsolete: true
Attachment #288188 - Flags: review?(sayrer)
Myk good stuff. Are there lesses for other components here or obvious areas of JS/XPCOM overhead that we could mitigate in those modules?
(In reply to comment #23)
> Myk good stuff. Are there lesses for other components here or obvious areas of
> JS/XPCOM overhead that we could mitigate in those modules?

There are probably other opportunities to cache references to frequently-accessed XPCOM services.  Also, if the cost of parsing a JS module is incurred only once, then factoring out common code (XPCOM glue, logging, etc.) from JS components into JS modules would help, although it's not clear how significant the win would be.
(In reply to comment #22)
> Created an attachment (id=288188) [details]
> patch v2: various perf fixes
> 
> ...
> 5. cached the frequently-accessed docshell in the browser widget, which should
> work as long as the docshell doesn't change for the life of the widget, which
> seems to be the case, as far as I can tell from various docshell docs (this
> could improve performance for other code as well);

>-                onget="return this.boxObject.QueryInterface(Components.interfaces.nsIContainerBoxObject).docShell;"
>+                onget="return this._docShell ? this._docShell : (this._docShell = this.boxObject.QueryInterface(Components.interfaces.nsIContainerBoxObject).docShell);"
 
I am not a javascript guru but I think be recommends:

onget="return this._docShell || (this._docShell = this.boxObject.QueryInterface(Components.interfaces.nsIContainerBoxObject).docShell);"
Blocks: 391817
Whiteboard: [has patch][need review sayrer]
Comment on attachment 288188 [details] [diff] [review]
patch v2: various perf fixes

r=me on the toolkit bits, John's nit makes sense

layout bits look incredibly sane, but I'll refrain from jumping in there...

roc, r+sr (no SR flag exists in the Toolkit component at present, yay)
Attachment #288188 - Flags: review?(sayrer)
Attachment #288188 - Flags: review?(roc)
Attachment #288188 - Flags: review+
Keywords: checkin-needed
Whiteboard: [has patch][need review sayrer] → [has patch]
If myk needs somebody else to commit this for him, he's free to add the checkin-needed keyword, but checkin-needed is only for people without commit access or for people with commit access who need somebody else to commit for them. I don't want to take cvsblame from somebody with commit access, as he/she may want to have blame for the patch. If I need to clarify the keyword description for this, I'll be glad to.
Keywords: checkin-needed
Indeed, I wanted to check this one in myself at a quiet time when I could isolate its effects on the performance tests from those of other checkins.

I've checked it in now:

Checking in browser/base/content/browser.js;
/cvsroot/mozilla/browser/base/content/browser.js,v  <--  browser.js
new revision: 1.902; previous revision: 1.901
done
Checking in browser/base/content/browser-textZoom.js;
/cvsroot/mozilla/browser/base/content/browser-textZoom.js,v  <--  browser-textZoom.js
new revision: 1.7; previous revision: 1.6
done
Removing browser/base/content/browser-contentPrefSink.js;
/cvsroot/mozilla/browser/base/content/browser-contentPrefSink.js,v  <--  browser-contentPrefSink.js
new revision: delete; previous revision: 1.4
done
Checking in toolkit/content/viewZoomOverlay.js;
/cvsroot/mozilla/toolkit/content/viewZoomOverlay.js,v  <--  viewZoomOverlay.js
new revision: 1.9; previous revision: 1.8
done
Checking in toolkit/content/widgets/browser.xml;
/cvsroot/mozilla/toolkit/content/widgets/browser.xml,v  <--  browser.xml
new revision: 1.110; previous revision: 1.109
done
Checking in toolkit/components/contentprefs/src/nsContentPrefService.js;
/cvsroot/mozilla/toolkit/components/contentprefs/src/nsContentPrefService.js,v  <--  nsContentPrefService.js
new revision: 1.11; previous revision: 1.10
done
Checking in layout/base/nsPresContext.cpp;
/cvsroot/mozilla/layout/base/nsPresContext.cpp,v  <--  nsPresContext.cpp
new revision: 3.336; previous revision: 3.335
done

Here are the Windows, Mac, and Linux Ts graphs marked at the build before the patch was checked in:

http://build-graphs.mozilla.org/graph/query.cgi?testname=startup&units=ms&tbox=bl-bldxp01_HEAD&autoscale=1&days=7&avg=1&showpoint=2007:11:26:01:04:49,1824
http://build-graphs.mozilla.org/graph/query.cgi?testname=startup&units=ms&tbox=xserve08.build.mozilla.org_Fx-Trunk&autoscale=1&days=7&avg=1&showpoint=2007:11:26:00:35:49,883
http://build-graphs.mozilla.org/graph/query.cgi?testname=startup&units=ms&tbox=bl-bldlnx03_fx-linux-tbox-HEAD&autoscale=1&days=7&avg=1&showpoint=2007:11:26:01:09:55,909

And here are the Txul graphs:

http://build-graphs.mozilla.org/graph/query.cgi?testname=xulwinopen&units=ms&tbox=bl-bldxp01_HEAD&autoscale=1&days=7&avg=1&showpoint=2007:11:26:01:01:05,450
http://build-graphs.mozilla.org/graph/query.cgi?testname=xulwinopen&units=ms&tbox=xserve08.build.mozilla.org_Fx-Trunk&autoscale=1&days=7&avg=1&showpoint=2007:11:26:00:32:08,141
http://build-graphs.mozilla.org/graph/query.cgi?testname=xulwinopen&units=ms&tbox=bl-bldlnx03_fx-linux-tbox-HEAD&autoscale=1&days=7&avg=1&showpoint=2007:11:26:01:06:14,224
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Depends on: 405483
Wow - that's a really noticeable dip in both Tp and Ts.  Well friggen done Myk!

This is the version of the patch I checked in.  It's identical to patch v2 but with John's nit from comment 25 fixed.
Depends on: 410559
No longer depends on: 410559
You need to log in before you can comment on or make changes to this bug.