Closed Bug 504858 Opened 15 years ago Closed 3 years ago

Investigate delaying initialization of bookmarks toolbar

Categories

(Firefox :: Bookmarks & History, defect, P3)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1667237

People

(Reporter: adw, Unassigned)

References

(Blocks 1 open bug, )

Details

(Keywords: perf, Whiteboard: [ts][fxperf:p3])

Attachments

(1 file, 1 obsolete file)

Some preliminary profiling with DTrace suggests that adding items to the bookmarks toolbar during its initialization contributes to > 3% of Firefox startup time in JSland.  I measured the durations of all JS funcs from a warm startup until BrowserStartup() returns.  When I delayed adding items to the toolbar for 5 seconds, i.e., running the initialization in a a setTimeout(), toolbar.xml dropped from 14ms (3.3%) to 3ms (0.8%).  This was with a copy of my own default work profile: 13 items on the toolbar -- 
the Most Visited smart folder, a regular folder, a bookmark with a live title,
and the rest were vanilla bookmarks.
Whiteboard: [ts]
A couple of corrections: I tested running the body of insertNewItem() in a timeout, and this timeout was 1 second, not 5.  Delaying where exactly -- insertNewItem() vs. _rebuild() vs. _viewer.itemInserted() vs. wherever else -- is something that should be investigated.
Were you making sure that we were not initializing the bookmarks service during this time?  That's going to be a bit of our startup time too.  I think this is the first time we'd hit the bookmarks service during startup.
Could be.  The toolbar executes a simple folder query on init, and that will initialize the bookmarks service if it hasn't been already.  However, in my simple experimenting I was only delaying insertNewItem(), and that's called after the toolbar's init.
About my methodology.  As I mentioned in comment 0, I've been measuring JSland, which is a hand-wavey way to say all the JS calls that the browser makes and DTrace reports before BrowserStartup() returns.  What percentage of total browser startup time is that?  I'm missing regions of call stacks that don't enter JS and entire stacks that don't enter JS at all.

As a quick test I tried measuring the duration from dtrace:::BEGIN to dtrace:::END, taking that as the total startup time, and starting up the browser immediately after starting DTrace.  2 seconds.  In the 3% number in comment 0, I took the sum of all JS call durations to be the JSland startup time, ~413ms.  That's only ~20% of 2 seconds, and 3.3% of 20% is < 1%, which is jack diddly-squat.
bug 504872 comment 1 may be relevant here as well.
(In reply to comment #5)
> bug 504872 comment 1 may be relevant here as well.

Well, the more items on the toolbar, the longer it takes.

I'll go ahead and note what I was writing before I wrote comment 3:

There's a large spot on my treemap within the vast unknown region covering calls from native code labeled "width".  Turns out it's the call in [1], which surprisingly is also from toolbar.xml, inside updateChevron().

The first time that line is run (when the toolbar's container is opened, invalidated, and the toolbar is rebuilt), it takes 14ms, 2.6% of total browser startup time in JS.

I don't know why that is, but I presume it's similar to the reason Gavin gave in bug 504872 comment 1:  Getting the toolbar's boxObject's width the first time first requires the binding to be attached or something.  The stack underneath width() contains many repeating calls to QueryInterface(), getAttribute(), and get_accessKey(), but DTrace doesn't tell me where they are coming from.  Looking at the box object and PresShell code it appears that might indeed be the case, and it makes sense that in order to get its width the toolbar has to be bound to its binding in its parent in the document, but my knowledge of the platform just isn't sufficient.

[1] http://mxr.mozilla.org/mozilla-central/source/browser/components/places/content/toolbar.xml#299
I think the slowness of getting the width from the boxObject is probably more likely explained by the need to flush layout in those cases, rather than XBL attachment. Not sure what the best way to verify that would be, though.
dbaron: do you have any insight into why so much time is spent getting the box's width? and how we can avoid this cost?
Priority: -- → P2
it's important to annotate here that bug 382466 changed the way the bookmarks toolbar updates content, this could actually change the measures taken here, even if still comment 8 is worth investigating even now.
Do you have a profile of what's happening during that 3% of the time?
(In reply to comment #9)
> it's important to annotate here that bug 382466 changed the way the bookmarks
> toolbar updates content, this could actually change the measures taken here,
> even if still comment 8 is worth investigating even now.

The call to width is gone in the latest nightly.  No surprise as you say, since bug 382466 removed the call I pointed to.

Repeating what I did above, though, toolbar.xml is still 3.2%.  Of that, insertItem (not in toolbar.xml, not sure where this is coming from), set_place, and _rebuild are 10% each, and insertNewItem is 70%.  When I try with fewer items on the toolbar, insertNewItem drops and consequently toolbar.xml does too.
insertNewItem: http://mxr.mozilla.org/mozilla-central/source/browser/components/places/content/toolbar.xml#194

ew. could be dom, binding, or db queries. can you tell us what's taking time inside insertNewItem?
appendChild:
http://mxr.mozilla.org/mozilla-central/source/browser/components/places/content/toolbar.xml#246

insertNewItem is called for each item on the toolbar of course.  All but the first call take 1ms or less.  The first takes ~4ms, with > 3ms of that in that appendChild call at the end of the method.  For the later calls, it looks like the bulk of the 1ms is split between that appendChild and the contiguous region starting here with createElement:
http://mxr.mozilla.org/mozilla-central/source/browser/components/places/content/toolbar.xml#203
Probably should point out that the appendChild call is appending the button to the toolbar.
ran some tests by commenting out the bookmarks toolbar altogether, having it empty, and default:

default toolbar was 5.5% of warm startup
empty toolbar was 2.5% of warm startup

this mostly lines up w/ adw's numbers above wrt to insertNewItem.
Attached patch push it back (obsolete) — Splinter Review
push back the population of the toolbar contents.

local testing w/ my debug build showed a non-trivial improvement, but too much noise to tell for sure. will push to places tree to get numbers.
saw clear wins on some boxes, and flat on others, and a few regression boxes. the tinderbox reporting stuff sent an email of 1.26% win on WinXP, but didn't have my check-in in the reported range, so i'm trusting that reporting system less than ever.

from the graphs:

xp/vista: clear win
linux: slight win
mac: flat

great example of how even at the very front-end, the performance effect of a change can differ greatly across platforms.
Attachment #402440 - Flags: review?(gavin.sharp)
Whiteboard: [ts] → [ts][has patch][needs review gavin]
backed out temp checkin on places branch.
are those measurements taken after bug 498130?

I'd not expect to see this hitting Talos for a simple reason, the default bookmarks are just three, while this perf test should be done with a full toolbar (with also a pretty full chevron i'd say).
Personally, I would not consider the browser to be up and started until after the Bookmarks toolbar is populated.  Moving it's initialization out of the startup time test sounds like cheating to me.
(In reply to comment #20)
> are those measurements taken after bug 498130?
> 
> I'd not expect to see this hitting Talos for a simple reason, the default
> bookmarks are just three, while this perf test should be done with a full
> toolbar (with also a pretty full chevron i'd say).

no, i'll try w/ that patch and a full toolbar.

(In reply to comment #21)
> Personally, I would not consider the browser to be up and started until after
> the Bookmarks toolbar is populated.  Moving it's initialization out of the
> startup time test sounds like cheating to me.

in my tests the difference was barely visible to the human eye. however, i'll confirm that it behaves the same with a full toolbar.
I think Bill has a point here - it seems unlikely that this change would actually improve performance, rather than just change what we're measuring. This boils down to a change of the order in which we load the toolbar vs. other stuff, right? Do we have evidence that changing the sequence alone is beneficial? I could imagine a case where doing this later would avoid the need to duplicate work (e.g. layout reflows), but I'm not sure that we've determined that to be the case here.
mostl ikely this change goes into the direction of "show browser window earlier" rather than "init faster", so in that view this could work. But most likely on modern CPUs this is not visible, could still make sense on mobile devices or netbooks.
(In reply to comment #23)
> I think Bill has a point here - it seems unlikely that this change would
> actually improve performance, rather than just change what we're measuring.
> This boils down to a change of the order in which we load the toolbar vs. other
> stuff, right? Do we have evidence that changing the sequence alone is
> beneficial? I could imagine a case where doing this later would avoid the need
> to duplicate work (e.g. layout reflows), but I'm not sure that we've determined
> that to be the case here.

Actually. I was concerned that initializing the bookmarks toolabar while it is being displayed would actually be likely to be slower than doing it while it was not displayed.
Ts measures how long it takes for a web page to finish loading after the browser starts up, since that's when whomever designed the test determined the browser is "ready for use". At that point, for example, the user can start reading the content in the web page.

Changes like this allow that usable state to happen sooner, by waiting until after that point to do work that is not required to get to that point.

It also reduces the amount of synchronous work required to bring up the browser chrome, making the application window appear sooner. This does not affect just application startup, but subsequent window openings as well.

(In reply to comment #23)
> I think Bill has a point here - it seems unlikely that this change would
> actually improve performance, rather than just change what we're measuring.
> This boils down to a change of the order in which we load the toolbar vs. other
> stuff, right? Do we have evidence that changing the sequence alone is
> beneficial? I could imagine a case where doing this later would avoid the need
> to duplicate work (e.g. layout reflows), but I'm not sure that we've determined
> that to be the case here.

See above, it's not entirely about raw speed. The change does not change the wall-clock time that it takes for the browser to start up, load a web page and populate the bookmarks toolbar.

WRT to reflow, I don't know enough about what would trigger them to answer.

Finally, I know 1.5% doesn't seem like a large win. However, it will make a difference as part of this larger campaign to make Firefox start up faster and feel snappier.
I don't think 1.5% is anything to sneeze at - if this shortens the time it takes for a window to appear by that much, then I'm all for it.

It wasn't clear to me that we'd determined that to be the case, though, and without that understanding it was hard to weigh against the cost to code complexity (proliferation of setTimeout(..., 0)s makes the code harder to debug/follow).
Attachment #402440 - Flags: review?(gavin.sharp) → review+
Dietrich, is this really a >1% win in overall warm startup wall clock, or are you measuring against a smaller number like time spent in JS?  e.g., comment 4
(In reply to comment #28)
> Dietrich, is this really a >1% win in overall warm startup wall clock, or are
> you measuring against a smaller number like time spent in JS?  e.g., comment 4

The 1.5% figure is from Ts reporting when I checked this into the Places branch (which, per previous comments, basically means that a content page was able to load faster due to the change).
Why does this add another delay after delayedStartup?
The patch is also wrong because bookmarksBarContent may be absent from the document.
(In reply to comment #30)
> Why does this add another delay after delayedStartup?

Because we thought it'd be good to yield again given all the work done that was done synchronously earlier in delayedStartup.
 
> The patch is also wrong because bookmarksBarContent may be absent from the
> document.

Indeed! I should have caught that...
(In reply to comment #31)
> (In reply to comment #30)
> > Why does this add another delay after delayedStartup?
> 
> Because we thought it'd be good to yield again given all the work done that was
> done synchronously earlier in delayedStartup.

It wouldn't block anything directly at the end of delayedStartup, though, would it?

At this point, we're probably just fooling the Ts test. Setting up the visible UI seems as important as getting the content to load, but the former is not measured directly.

Would talos still tell us about perf regressions in the toolbar initialization code with the nested delays?
Whiteboard: [ts][has patch][needs review gavin] → [ts]
Attachment #402440 - Attachment is obsolete: true
(In reply to comment #32)
> It wouldn't block anything directly at the end of delayedStartup, though, would
> it?

It's likely blocking page load at that point. It's hard to say whether yielding there is beneficial without comparing the cost of bookmark initialization to the rest of the code run during delayedStartup() (and even if we could do that reliably, there's no one sequence that is guaranteed to be more efficient for every user's startup). I think it's unlikely to hurt to do it in a second step, but without a stronger argument *for* it, perhaps we should just take the simplest approach and avoid the extra timeout.

> At this point, we're probably just fooling the Ts test. Setting up the visible
> UI seems as important as getting the content to load, but the former is not
> measured directly.

I don't think I agree that initializing the bookmarks toolbar UI is as important as loading content. It's certainly not important enough to block having the window appear, which is what currently happens. I don't think making that trade-off is "fooling the test".

I would like more comprehensive performance tests that measure every aspect of start-up in greater detail, to allow us to make better decisions like these, though. I don't think it would be too hard to write a perf test for bookmarks toolbar initialization specifically, if you're worried that we'll lose that coverage by making this change.
(In reply to comment #33)
> > At this point, we're probably just fooling the Ts test. Setting up the visible
> > UI seems as important as getting the content to load, but the former is not
> > measured directly.
> 
> I don't think I agree that initializing the bookmarks toolbar UI is as
> important as loading content. It's certainly not important enough to block
> having the window appear, which is what currently happens. I don't think making
> that trade-off is "fooling the test".

Doing it in delayedStartup should solve the window blocking issue. It's the extra delay that's probably fooling the start-up test. I suspect that it actually shows the UI later, which erroneously won't show up as a start-up perf regression.
(In reply to comment #34)
> I suspect that it actually shows the UI later,

In fact, doing it in delayedStartup() already results in a visible delay when opening a new window in an existing session...
Attached patch patch v2Splinter Review
So I propose we do it in BrowserStartup. I don't expect this to make a difference for the Ts test, but it should help when the parent toolbar has been collapsed.
Assignee: nobody → dao
Attachment #405754 - Flags: review?(gavin.sharp)
The collapsed/hidden/whatever case should be handled in the style sheet that applies the binding.
...which will also avoid applying the toolbar binding when it isn't necessary.

Having said that, I don't think this issue belongs to this bug. If delaying initialization doesn't help, let's move on.
Attachment #405754 - Flags: review?(gavin.sharp) → review-
Assignee: dao → nobody
Time to WONTFIX, in my opinion.

I can/file take a bug for Ts wins for windows in which the bookmarks toolbar is hidden. Don't expect much though.
One more thing: We can probably optimize the perf-sensitive code for the default-setup case, in which the toolbar has the width of the window (so we don't need to query its boxObject).
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h".

In Thunderbird 3.0b, you do that as follows:
Tools | Message Filters
Make sure the correct account is selected. Click "New"
Conditions: Body   contains   places-to-b-and-h
Change the action to "Delete Message".
Select "Manually Run" from the dropdown at the top.
Click OK.

Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter.

Gerv
Component: Places → Bookmarks & History
QA Contact: places → bookmarks
Blocks: 447581
Priority: P2 → --
Priority: -- → P3
Keywords: perf
Whiteboard: [ts] → [ts][fxperf]
As far as I can tell, initting the bookmarks toolbar doesn't block first paint, but occurs within delayedStartup: https://searchfox.org/mozilla-central/rev/7e663b9fa578d425684ce2560e5fa2464f504b34/browser/base/content/browser.js#1495

It still seems pretty expensive for some people though, at least according to the FX_BOOKMARKS_TOOLBAR_INIT_MS probe: https://mzl.la/2E4yCsH

So diving into ways we can optimize this more is probably righteous, at least from a responsiveness-after-new-window-paints sort of perspective.
Whiteboard: [ts][fxperf] → [ts][fxperf:p3]
A better solution than a delay, may be bug 1047817, if we just want to avoid main-thread I/O, that I suspect is most of the cost we pay here. This could be a wontfix VS that other bug.

We've started doing this in 85, this happened in bug 1667237. comment #43 is still on point in that ideally, we should switch to an async view and async IO instead, but for the moment, this takes the impact out of the critical path.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.