Last Comment Bug 718864 - huffingtonpost loads up to 50% slower in Firefox than in opera
: huffingtonpost loads up to 50% slower in Firefox than in opera
Status: NEW
[snappy:p1]
:
Product: Core
Classification: Components
Component: CSS Parsing and Computation (show other bugs)
: unspecified
: x86 Windows 7
: -- normal with 3 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
:
Mentors:
Depends on: 705877
Blocks:
  Show dependency treegraph
 
Reported: 2012-01-17 15:48 PST by (dormant account)
Modified: 2013-04-26 10:32 PDT (History)
15 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments

Description (dormant account) 2012-01-17 15:48:23 PST
on average Firefox is about 20-30% slower, 50-60% on slowest pages. See cache1/2 in http://groups.google.com/group/mozilla.dev.tech.network/browse_thread/thread/6dfa3e6ebe80c336
Comment 1 Boris Zbarsky [:bz] (still a bit busy) 2012-01-17 16:43:53 PST
Do we have a good indication that this is a DOM issue?  Profiles?

Failing that, steps to reproduce?  What exact urls are you using?  The bigger the gap, the better.
Comment 2 (dormant account) 2012-01-17 16:53:21 PST
I don't have any conclusive evidence that it's the dom.

From huffingtonpost.com clicking one of politics, living, tech results in that page being 40-50% slower to load than in opera. I suggest disabling network cache as it makes things even slower to load(and is a known problem).

Note this is on a slow cpu(intel atom). I'm hoping that if you set one of these pages on autoreload, something obvious will show up in the profiler.

I suspect flash might be to blame here too, I'll ask QA to rerun without flash.
Comment 3 Boris Zbarsky [:bz] (still a bit busy) 2012-01-17 16:55:18 PST
OK, so I'll try with http://www.huffingtonpost.com/politics/, cache off, flash off for a start.  Thanks.
Comment 4 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-01-17 16:57:46 PST
Just so we are clear, here is the test I am going to run:

new profile > set cache to 0 MB > disable Flash in about:plugins > run Huffington Post test

I will report back here the results
Comment 5 Boris Zbarsky [:bz] (still a bit busy) 2012-01-17 21:58:25 PST
Hmm, so on my particular setup (Mac build, no flash, cache off), I see us loading just as fast as Opera if not faster.

I tried running in a loop, and we do seem to more or less peg a core.  The breakdown of main-thread CPU time is:

 21% layout/style flushes from WillPaint (including rule matching for restyles, lazy 
     frame construction, some RebuildAllStyleData and a bit of reflow), 
  5% painting
  5% compiling scripts loaded via src attributes
  3% executing those scripts
  3% compiling inline scripts
  6% executing inline scripts; at least 2/3 of this is frame construction due to flushes
  9% restyles caused by flushes from XHR readystate change handlers at end of XHR load
  2% parsing stylesheets
  1% firing onload (seems to involve some scripts mostly getting innerHTML)
  1% gzip decompression
 17% refresh driver notifications (splitting 60-20-20 as restyling, lazy frame
     construction, reflow)

etc.  In general, 25k of the 55k samples are under ResolveStyleFor.
Comment 6 Boris Zbarsky [:bz] (still a bit busy) 2012-01-17 21:59:57 PST
Oh, and for the style stuff, something like 77% of it is under SelectorMatchesTree.  I'll bet money on descendant selectors.

I'm not quite sure why we end up with style data being dirty and needing recomputing so much.  That might be worth looking into as well.
Comment 7 Boris Zbarsky [:bz] (still a bit busy) 2012-01-17 22:02:27 PST
But for that first part, bug 705877 might well help.
Comment 8 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-01-18 13:13:28 PST
(In reply to Boris Zbarsky (:bz) from comment #3)
> OK, so I'll try with http://www.huffingtonpost.com/politics/, cache off,
> flash off for a start.  Thanks.

I confirmed this morning the original testing was performed without Flash installed. I've requested that the tests be re-run with Flash installed. I will report back here by tomorrow.
Comment 9 Lawrence Mandel [:lmandel] (use needinfo) 2012-01-26 07:49:14 PST
Anthony, do you have useful to report back with the new test results?
Comment 10 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-01-26 08:40:18 PST
(In reply to Lawrence Mandel [:lmandel] from comment #9)
> Anthony, do you have useful to report back with the new test results?

I provided Taras with the raw data. I don't think I'm qualified to draw conclusions from it. He thinks Flash is not a variable in this issue.

The raw data can be found here:
https://docs.google.com/spreadsheet/ccc?key=0AmkRt0ylPb8zdGxPRDhUbGdoM3lwd3JQNXY3bW5nNHc&hl=en_US#gid=3
Comment 11 (dormant account) 2012-02-22 12:04:21 PST
huffington post is killing us in http://www.tomshardware.com/reviews/chrome-17-firefox-10-ubuntu,3129-6.html
Comment 12 Boris Zbarsky [:bz] (still a bit busy) 2012-02-22 20:15:48 PST
So some possible data...

I just did a cached load of the uri in comment 5.  25% of the wall-clock time the CPU was idle.  Of the CPU time, close to 1/3 is under SelectorMatchesTree, so as I said descendant selectors.  Bug 705877 would help there.  Nothing new so far.

Then I looked into why we have so many restyles to start with.  The page has a several of <link rel="stylesheet"> various places inside the page.  So we start rendering, then those stylesheet loads start, then those sheets load and we recompute style for the whole page.  There are also several <style> elements in the middle of the page; same story there.

Apart from the restyling, there are not obvious hotspots other than script parsing and execution (about 1/6 of the total pageload time there).

Finally, just as the page finishes loading, it changes the class on the <body> from "loadimages frontpage politics" to "loadimages frontpage politics firefox13".  This causes another full-page restyle, unfortunately...  we could try to optimize that away, with enough effort, I guess.
Comment 13 (dormant account) 2012-02-22 20:27:13 PST
Should we be recording heuristics in places to reduce restyling for known offenders?
Comment 14 Boris Zbarsky [:bz] (still a bit busy) 2012-02-22 20:43:24 PST
Not sure what you're asking....
Comment 15 (dormant account) 2012-02-27 13:54:12 PST
Boris,
If I remember our irc conversation correctly, we have a restyle timer firing every fixed interval. Since huffpo is doing too many restyles is it worth playing with that heuristic?
In particular I'm guessing that having the timer interval be max(time-to-restyle*constant, whatever interval we have now) might help.

Where is this code?

Is there a similar timer for reflows?

I think we should add telemetry to see how much time we waste on restyling while loading pages to see if our heuristics tuning is making progress in correct direction.

I asked Nathan to look into this, but he refuses to do anything until you OK it.
Comment 16 Boris Zbarsky [:bz] (still a bit busy) 2012-02-27 14:05:33 PST
> Since huffpo is doing too many restyles is it worth playing with that heuristic?

Possibly....

> Where is this code?

nsRefreshDriver.cpp

> Is there a similar timer for reflows?

It's the same timer.  When it fires, we do restyles, update SMIL animations, update CSS transitions and animations, run any pending animation callbacks, do reflows, process invalidates.  Basically "do one frame".

I have no problem with measuring things here; the hard part is defining "waste".
Comment 17 Nathan Froyd [:froydnj] 2012-02-27 14:08:50 PST
Is it even worthwhile tweaking the timer?  From bz's investigation, it sounds like we're eating restyles because of the ordering of elements in the page, not because of timers firing.
Comment 18 Boris Zbarsky [:bz] (still a bit busy) 2012-02-27 14:15:26 PST
Well, it's possible for a page with markup like this:

  <body>
   <style>/* some rules */</style>
   Some content
   <style>/* some rules */<style>
  </body>

to end up restyling either once or twice depending on whether "some content" includes any scripts that query styles and on whether the timer fires between parsing the closing tag of the first <style> and the opening tag of the second one...

For purposes of this bug, btw, I'd like it if people would measure in builds with the patches from bug 705877 and its dependencies...
Comment 19 (dormant account) 2012-02-27 14:26:25 PST
Since everything is driven by the refresh drivers sounds like what you are suggesting is suppressing restyles. Is there machinery for that?
Comment 20 Boris Zbarsky [:bz] (still a bit busy) 2012-02-27 14:37:24 PST
Not really, though we could tweak the refresh driver's timer delay computation or something.

But note that there is talk of using the same refresh driver for the chrome and the currently selected tab.
Comment 21 (dormant account) 2012-02-27 15:10:49 PST
bz clarified this on irc. Foreground tabs are trying to paint at 60hz during pageload (which we agree is too aggressive in our current architecture.
background tab updates are throttled to "1000ms * 2^(number of firings since the tab got hidden)".

So the thing to investigate is how much time is spent on layout while loading in:
a) foreground tab
b) background tab

and whether that time is spent on
a) js
b) layouty stuff


For this specific bug, the next step is to compare huffpo load times in foreground vs background.
Comment 22 Boris Zbarsky [:bz] (still a bit busy) 2012-02-27 15:20:27 PST
I just tried that, fwiw.  With the patches in bug 705877, I get load times of about 1700ms on my machine in foreground and maybe 1600ms in background.  With a lot of noise....
Comment 23 (dormant account) 2012-02-27 17:44:36 PST
(In reply to Boris Zbarsky (:bz) from comment #22)
> I just tried that, fwiw.  With the patches in bug 705877, I get load times
> of about 1700ms on my machine in foreground and maybe 1600ms in background. 
> With a lot of noise....

Can you post try builds so we can try this on a slower machine?
Comment 24 Boris Zbarsky [:bz] (still a bit busy) 2012-02-27 18:50:13 PST
There should be Linux and Windows PGO builds at https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bzbarsky@mozilla.com-56d56a5d8b2a/ (so far only the linux32 opt build is up).

There are non-PGO builds at https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bzbarsky@mozilla.com-4508fa7cf0db/ which should be there for all platforms.
Comment 25 (dormant account) 2012-02-28 16:34:07 PST
There is no difference between loading huffpo in fore/background tabs with your builds. I did notice that your builds load it in 8seconds vs 13/14 for my profiling nightly.
Comment 26 Lawrence Mandel [:lmandel] (use needinfo) 2012-03-14 08:29:03 PDT
(In reply to Taras Glek (:taras) from comment #25)
> There is no difference between loading huffpo in fore/background tabs with
> your builds. I did notice that your builds load it in 8seconds vs 13/14 for
> my profiling nightly.

If I'm reading this correctly, the fix for bug 705877 has cut ~40% of the page load time in this case. What's the next step for this bug?
Comment 27 Boris Zbarsky [:bz] (still a bit busy) 2012-03-14 19:28:49 PDT
Probably another profile (I'll try to do that tomorrow), but we should seriously think about other ways to approach the measurements because the profile will just show CPU time, and at this point I expect to have a large fraction of CPU idle time waiting on ... something.
Comment 28 Randell Jesup [:jesup] 2012-03-15 04:52:28 PDT
Well, a real-time profile will help, except it probably won't identify what's being waited on because it's not synchronous waits where the call stack would identify what the cause was.

Some type of waterfall display of the page load would help, especially if you could mix in events (restyles, reflows, paints, etc) and mark start/stop points of processing to help visualize the entire flow (data transfer and processing), and then let you click on any of them for details (in processing elements, the profile of just that part).

Love to work on it; no time though.
Comment 29 Boris Zbarsky [:bz] (still a bit busy) 2012-03-15 12:19:54 PDT
Shark's ability to do real-time profiles got broken by Apple, sadly...

In any case, a profile of a _cached_ load shows almost no idle time, as far as I can tell.  Basic breakdown of user-space main-thread time (which is about 94% of the total main-thread time), with all percentages being percentages of the userspace main-thread stuff:

  3% Load starts and moving data around
  3% CSS parsing
  7% Compiling script to bytecode for <script> tags with src attributes
 10% executing script for those <script> tags (layout flushes, cookie setting, lots of
     actual JS execution, etc, etc).
  8% Compiling inline scripts to bytecode
 12% executing inline scripts (lots of mjit compilation and whatnot here, as well as all
     the things for external scripts)
  7% executing JS from setTimeout calls (more of the same as above)
 10% refresh ticks (restyles and layout, mostly, as well as CreateNeededFrames).
 13% painting
  2% layout flush before firing onload
  4% script running from the DOMContentLoaded event handler
 
and then a long tail of minor stuff...
Comment 30 (dormant account) 2012-06-21 14:36:12 PDT
Anthony, we need to rerun this test
Comment 31 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-06-21 14:48:11 PDT
(In reply to Taras Glek (:taras) from comment #30)
> Anthony, we need to rerun this test

What build?
Comment 32 (dormant account) 2012-06-21 15:21:05 PDT
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #31)
> (In reply to Taras Glek (:taras) from comment #30)
> > Anthony, we need to rerun this test
> 
> What build?

Latest nightly
Comment 33 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-06-21 16:38:44 PDT
As instructed, I reran the tests with Firefox 15.0a1 2012-06-01. Results can be seen here:
https://docs.google.com/spreadsheet/ccc?key=0AmkRt0ylPb8zdGxPRDhUbGdoM3lwd3JQNXY3bW5nNHc&hl=en_US#gid=4

Keep in mind that there is probably a 0.5-1.0s due to manually start/stopping the timer between clicks. Assuming the same amount of error across all tests there are a couple of interesting musings.

* 6 months ago: 7-15s without Flash, 7-20s with Flash, average deviation of 5%
* 2012-06-01: 4-9s without Flash, 4-7s with Flash, average deviation of 0.5%

Again, not very scientific but interesting nonetheless.

Note, each testrun was done on a clean VM snapshot.
1. Start from baseline snapshot
2. Install Firefox Nightly
3. Run the tests
4. Revert snapshot
5. Install Firefox Nightly and Flash 11.3
6. Run the tests
Comment 34 Boris Zbarsky [:bz] (still a bit busy) 2012-12-25 12:08:40 PST
Taras, how do things look here on your end now?

Note You need to log in before you can comment on or make changes to this bug.