Open Bug 718864 Opened 12 years ago Updated 2 years ago

huffingtonpost loads up to 50% slower in Firefox than in opera

Categories

(Core :: CSS Parsing and Computation, defect)

x86
Windows 7
defect

Tracking

()

People

(Reporter: taras.mozilla, Unassigned)

References

Details

(Whiteboard: [snappy:p1])

on average Firefox is about 20-30% slower, 50-60% on slowest pages. See cache1/2 in http://groups.google.com/group/mozilla.dev.tech.network/browse_thread/thread/6dfa3e6ebe80c336
Do we have a good indication that this is a DOM issue?  Profiles?

Failing that, steps to reproduce?  What exact urls are you using?  The bigger the gap, the better.
I don't have any conclusive evidence that it's the dom.

From huffingtonpost.com clicking one of politics, living, tech results in that page being 40-50% slower to load than in opera. I suggest disabling network cache as it makes things even slower to load(and is a known problem).

Note this is on a slow cpu(intel atom). I'm hoping that if you set one of these pages on autoreload, something obvious will show up in the profiler.

I suspect flash might be to blame here too, I'll ask QA to rerun without flash.
OK, so I'll try with http://www.huffingtonpost.com/politics/, cache off, flash off for a start.  Thanks.
Just so we are clear, here is the test I am going to run:

new profile > set cache to 0 MB > disable Flash in about:plugins > run Huffington Post test

I will report back here the results
Hmm, so on my particular setup (Mac build, no flash, cache off), I see us loading just as fast as Opera if not faster.

I tried running in a loop, and we do seem to more or less peg a core.  The breakdown of main-thread CPU time is:

 21% layout/style flushes from WillPaint (including rule matching for restyles, lazy 
     frame construction, some RebuildAllStyleData and a bit of reflow), 
  5% painting
  5% compiling scripts loaded via src attributes
  3% executing those scripts
  3% compiling inline scripts
  6% executing inline scripts; at least 2/3 of this is frame construction due to flushes
  9% restyles caused by flushes from XHR readystate change handlers at end of XHR load
  2% parsing stylesheets
  1% firing onload (seems to involve some scripts mostly getting innerHTML)
  1% gzip decompression
 17% refresh driver notifications (splitting 60-20-20 as restyling, lazy frame
     construction, reflow)

etc.  In general, 25k of the 55k samples are under ResolveStyleFor.
Component: DOM: Core & HTML → Style System (CSS)
QA Contact: general → style-system
Oh, and for the style stuff, something like 77% of it is under SelectorMatchesTree.  I'll bet money on descendant selectors.

I'm not quite sure why we end up with style data being dirty and needing recomputing so much.  That might be worth looking into as well.
But for that first part, bug 705877 might well help.
Depends on: 705877
(In reply to Boris Zbarsky (:bz) from comment #3)
> OK, so I'll try with http://www.huffingtonpost.com/politics/, cache off,
> flash off for a start.  Thanks.

I confirmed this morning the original testing was performed without Flash installed. I've requested that the tests be re-run with Flash installed. I will report back here by tomorrow.
Whiteboard: [snappy:p2]
Anthony, do you have useful to report back with the new test results?
(In reply to Lawrence Mandel [:lmandel] from comment #9)
> Anthony, do you have useful to report back with the new test results?

I provided Taras with the raw data. I don't think I'm qualified to draw conclusions from it. He thinks Flash is not a variable in this issue.

The raw data can be found here:
https://docs.google.com/spreadsheet/ccc?key=0AmkRt0ylPb8zdGxPRDhUbGdoM3lwd3JQNXY3bW5nNHc&hl=en_US#gid=3
So some possible data...

I just did a cached load of the uri in comment 5.  25% of the wall-clock time the CPU was idle.  Of the CPU time, close to 1/3 is under SelectorMatchesTree, so as I said descendant selectors.  Bug 705877 would help there.  Nothing new so far.

Then I looked into why we have so many restyles to start with.  The page has a several of <link rel="stylesheet"> various places inside the page.  So we start rendering, then those stylesheet loads start, then those sheets load and we recompute style for the whole page.  There are also several <style> elements in the middle of the page; same story there.

Apart from the restyling, there are not obvious hotspots other than script parsing and execution (about 1/6 of the total pageload time there).

Finally, just as the page finishes loading, it changes the class on the <body> from "loadimages frontpage politics" to "loadimages frontpage politics firefox13".  This causes another full-page restyle, unfortunately...  we could try to optimize that away, with enough effort, I guess.
Should we be recording heuristics in places to reduce restyling for known offenders?
Not sure what you're asking....
Boris,
If I remember our irc conversation correctly, we have a restyle timer firing every fixed interval. Since huffpo is doing too many restyles is it worth playing with that heuristic?
In particular I'm guessing that having the timer interval be max(time-to-restyle*constant, whatever interval we have now) might help.

Where is this code?

Is there a similar timer for reflows?

I think we should add telemetry to see how much time we waste on restyling while loading pages to see if our heuristics tuning is making progress in correct direction.

I asked Nathan to look into this, but he refuses to do anything until you OK it.
> Since huffpo is doing too many restyles is it worth playing with that heuristic?

Possibly....

> Where is this code?

nsRefreshDriver.cpp

> Is there a similar timer for reflows?

It's the same timer.  When it fires, we do restyles, update SMIL animations, update CSS transitions and animations, run any pending animation callbacks, do reflows, process invalidates.  Basically "do one frame".

I have no problem with measuring things here; the hard part is defining "waste".
Is it even worthwhile tweaking the timer?  From bz's investigation, it sounds like we're eating restyles because of the ordering of elements in the page, not because of timers firing.
Well, it's possible for a page with markup like this:

  <body>
   <style>/* some rules */</style>
   Some content
   <style>/* some rules */<style>
  </body>

to end up restyling either once or twice depending on whether "some content" includes any scripts that query styles and on whether the timer fires between parsing the closing tag of the first <style> and the opening tag of the second one...

For purposes of this bug, btw, I'd like it if people would measure in builds with the patches from bug 705877 and its dependencies...
Since everything is driven by the refresh drivers sounds like what you are suggesting is suppressing restyles. Is there machinery for that?
Not really, though we could tweak the refresh driver's timer delay computation or something.

But note that there is talk of using the same refresh driver for the chrome and the currently selected tab.
bz clarified this on irc. Foreground tabs are trying to paint at 60hz during pageload (which we agree is too aggressive in our current architecture.
background tab updates are throttled to "1000ms * 2^(number of firings since the tab got hidden)".

So the thing to investigate is how much time is spent on layout while loading in:
a) foreground tab
b) background tab

and whether that time is spent on
a) js
b) layouty stuff


For this specific bug, the next step is to compare huffpo load times in foreground vs background.
Whiteboard: [snappy:p2] → [snappy:p1]
I just tried that, fwiw.  With the patches in bug 705877, I get load times of about 1700ms on my machine in foreground and maybe 1600ms in background.  With a lot of noise....
(In reply to Boris Zbarsky (:bz) from comment #22)
> I just tried that, fwiw.  With the patches in bug 705877, I get load times
> of about 1700ms on my machine in foreground and maybe 1600ms in background. 
> With a lot of noise....

Can you post try builds so we can try this on a slower machine?
There should be Linux and Windows PGO builds at https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bzbarsky@mozilla.com-56d56a5d8b2a/ (so far only the linux32 opt build is up).

There are non-PGO builds at https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bzbarsky@mozilla.com-4508fa7cf0db/ which should be there for all platforms.
There is no difference between loading huffpo in fore/background tabs with your builds. I did notice that your builds load it in 8seconds vs 13/14 for my profiling nightly.
(In reply to Taras Glek (:taras) from comment #25)
> There is no difference between loading huffpo in fore/background tabs with
> your builds. I did notice that your builds load it in 8seconds vs 13/14 for
> my profiling nightly.

If I'm reading this correctly, the fix for bug 705877 has cut ~40% of the page load time in this case. What's the next step for this bug?
Probably another profile (I'll try to do that tomorrow), but we should seriously think about other ways to approach the measurements because the profile will just show CPU time, and at this point I expect to have a large fraction of CPU idle time waiting on ... something.
Well, a real-time profile will help, except it probably won't identify what's being waited on because it's not synchronous waits where the call stack would identify what the cause was.

Some type of waterfall display of the page load would help, especially if you could mix in events (restyles, reflows, paints, etc) and mark start/stop points of processing to help visualize the entire flow (data transfer and processing), and then let you click on any of them for details (in processing elements, the profile of just that part).

Love to work on it; no time though.
Shark's ability to do real-time profiles got broken by Apple, sadly...

In any case, a profile of a _cached_ load shows almost no idle time, as far as I can tell.  Basic breakdown of user-space main-thread time (which is about 94% of the total main-thread time), with all percentages being percentages of the userspace main-thread stuff:

  3% Load starts and moving data around
  3% CSS parsing
  7% Compiling script to bytecode for <script> tags with src attributes
 10% executing script for those <script> tags (layout flushes, cookie setting, lots of
     actual JS execution, etc, etc).
  8% Compiling inline scripts to bytecode
 12% executing inline scripts (lots of mjit compilation and whatnot here, as well as all
     the things for external scripts)
  7% executing JS from setTimeout calls (more of the same as above)
 10% refresh ticks (restyles and layout, mostly, as well as CreateNeededFrames).
 13% painting
  2% layout flush before firing onload
  4% script running from the DOMContentLoaded event handler
 
and then a long tail of minor stuff...
Anthony, we need to rerun this test
Keywords: qawanted
(In reply to Taras Glek (:taras) from comment #30)
> Anthony, we need to rerun this test

What build?
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #31)
> (In reply to Taras Glek (:taras) from comment #30)
> > Anthony, we need to rerun this test
> 
> What build?

Latest nightly
As instructed, I reran the tests with Firefox 15.0a1 2012-06-01. Results can be seen here:
https://docs.google.com/spreadsheet/ccc?key=0AmkRt0ylPb8zdGxPRDhUbGdoM3lwd3JQNXY3bW5nNHc&hl=en_US#gid=4

Keep in mind that there is probably a 0.5-1.0s due to manually start/stopping the timer between clicks. Assuming the same amount of error across all tests there are a couple of interesting musings.

* 6 months ago: 7-15s without Flash, 7-20s with Flash, average deviation of 5%
* 2012-06-01: 4-9s without Flash, 4-7s with Flash, average deviation of 0.5%

Again, not very scientific but interesting nonetheless.

Note, each testrun was done on a clean VM snapshot.
1. Start from baseline snapshot
2. Install Firefox Nightly
3. Run the tests
4. Revert snapshot
5. Install Firefox Nightly and Flash 11.3
6. Run the tests
Keywords: qawanted
Taras, how do things look here on your end now?
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.