718864 - huffingtonpost loads up to 50% slower in Firefox than in opera

Reporter

Description

•

12 years ago

on average Firefox is about 20-30% slower, 50-60% on slowest pages. See cache1/2 in http://groups.google.com/group/mozilla.dev.tech.network/browse_thread/thread/6dfa3e6ebe80c336

Boris Zbarsky [:bzbarsky]

Comment 1

•

12 years ago

Do we have a good indication that this is a DOM issue?  Profiles?

Failing that, steps to reproduce?  What exact urls are you using?  The bigger the gap, the better.

(dormant account)

Reporter

Comment 2

•

12 years ago

I don't have any conclusive evidence that it's the dom.

From huffingtonpost.com clicking one of politics, living, tech results in that page being 40-50% slower to load than in opera. I suggest disabling network cache as it makes things even slower to load(and is a known problem).

Note this is on a slow cpu(intel atom). I'm hoping that if you set one of these pages on autoreload, something obvious will show up in the profiler.

I suspect flash might be to blame here too, I'll ask QA to rerun without flash.

Boris Zbarsky [:bzbarsky]

Comment 3

•

12 years ago

OK, so I'll try with http://www.huffingtonpost.com/politics/, cache off, flash off for a start.  Thanks.

u279076

Comment 4

•

12 years ago

Just so we are clear, here is the test I am going to run:

new profile > set cache to 0 MB > disable Flash in about:plugins > run Huffington Post test

I will report back here the results

Boris Zbarsky [:bzbarsky]

Comment 5

•

12 years ago

Hmm, so on my particular setup (Mac build, no flash, cache off), I see us loading just as fast as Opera if not faster.

I tried running in a loop, and we do seem to more or less peg a core.  The breakdown of main-thread CPU time is:

 21% layout/style flushes from WillPaint (including rule matching for restyles, lazy 
     frame construction, some RebuildAllStyleData and a bit of reflow), 
  5% painting
  5% compiling scripts loaded via src attributes
  3% executing those scripts
  3% compiling inline scripts
  6% executing inline scripts; at least 2/3 of this is frame construction due to flushes
  9% restyles caused by flushes from XHR readystate change handlers at end of XHR load
  2% parsing stylesheets
  1% firing onload (seems to involve some scripts mostly getting innerHTML)
  1% gzip decompression
 17% refresh driver notifications (splitting 60-20-20 as restyling, lazy frame
     construction, reflow)

etc.  In general, 25k of the 55k samples are under ResolveStyleFor.

Component: DOM: Core & HTML → Style System (CSS)

QA Contact: general → style-system

Boris Zbarsky [:bzbarsky]

Comment 6

•

12 years ago

Oh, and for the style stuff, something like 77% of it is under SelectorMatchesTree.  I'll bet money on descendant selectors.

I'm not quite sure why we end up with style data being dirty and needing recomputing so much.  That might be worth looking into as well.

Boris Zbarsky [:bzbarsky]

Comment 7

•

12 years ago

But for that first part, bug 705877 might well help.

Depends on: 705877

u279076

Comment 8

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #3)
> OK, so I'll try with http://www.huffingtonpost.com/politics/, cache off,
> flash off for a start.  Thanks.

I confirmed this morning the original testing was performed without Flash installed. I've requested that the tests be re-run with Flash installed. I will report back here by tomorrow.

(dormant account)

Reporter

Updated

•

12 years ago

Whiteboard: [snappy:p2]

Lawrence Mandel [:lmandel] (use needinfo)

Comment 9

•

12 years ago

Anthony, do you have useful to report back with the new test results?

u279076

Comment 10

•

12 years ago

(In reply to Lawrence Mandel [:lmandel] from comment #9)
> Anthony, do you have useful to report back with the new test results?

I provided Taras with the raw data. I don't think I'm qualified to draw conclusions from it. He thinks Flash is not a variable in this issue.

The raw data can be found here:
https://docs.google.com/spreadsheet/ccc?key=0AmkRt0ylPb8zdGxPRDhUbGdoM3lwd3JQNXY3bW5nNHc&hl=en_US#gid=3

(dormant account)

Reporter

Comment 11

•

12 years ago

huffington post is killing us in http://www.tomshardware.com/reviews/chrome-17-firefox-10-ubuntu,3129-6.html

Boris Zbarsky [:bzbarsky]

Comment 12

•

12 years ago

So some possible data...

I just did a cached load of the uri in comment 5.  25% of the wall-clock time the CPU was idle.  Of the CPU time, close to 1/3 is under SelectorMatchesTree, so as I said descendant selectors.  Bug 705877 would help there.  Nothing new so far.

Then I looked into why we have so many restyles to start with.  The page has a several of <link rel="stylesheet"> various places inside the page.  So we start rendering, then those stylesheet loads start, then those sheets load and we recompute style for the whole page.  There are also several <style> elements in the middle of the page; same story there.

Apart from the restyling, there are not obvious hotspots other than script parsing and execution (about 1/6 of the total pageload time there).

Finally, just as the page finishes loading, it changes the class on the <body> from "loadimages frontpage politics" to "loadimages frontpage politics firefox13".  This causes another full-page restyle, unfortunately...  we could try to optimize that away, with enough effort, I guess.

(dormant account)

Reporter

Comment 13

•

12 years ago

Should we be recording heuristics in places to reduce restyling for known offenders?

Boris Zbarsky [:bzbarsky]

Comment 14

•

12 years ago

Not sure what you're asking....

(dormant account)

Reporter

Comment 15

•

12 years ago

Boris,
If I remember our irc conversation correctly, we have a restyle timer firing every fixed interval. Since huffpo is doing too many restyles is it worth playing with that heuristic?
In particular I'm guessing that having the timer interval be max(time-to-restyle*constant, whatever interval we have now) might help.

Where is this code?

Is there a similar timer for reflows?

I think we should add telemetry to see how much time we waste on restyling while loading pages to see if our heuristics tuning is making progress in correct direction.

I asked Nathan to look into this, but he refuses to do anything until you OK it.

Boris Zbarsky [:bzbarsky]

Comment 16

•

12 years ago

> Since huffpo is doing too many restyles is it worth playing with that heuristic?

Possibly....

> Where is this code?

nsRefreshDriver.cpp

> Is there a similar timer for reflows?

It's the same timer.  When it fires, we do restyles, update SMIL animations, update CSS transitions and animations, run any pending animation callbacks, do reflows, process invalidates.  Basically "do one frame".

I have no problem with measuring things here; the hard part is defining "waste".

Nathan Froyd [:froydnj]

Comment 17

•

12 years ago

Is it even worthwhile tweaking the timer?  From bz's investigation, it sounds like we're eating restyles because of the ordering of elements in the page, not because of timers firing.

Boris Zbarsky [:bzbarsky]

Comment 18

•

12 years ago

Well, it's possible for a page with markup like this:

  <body>
   <style>/* some rules */</style>
   Some content
   <style>/* some rules */<style>
  </body>

to end up restyling either once or twice depending on whether "some content" includes any scripts that query styles and on whether the timer fires between parsing the closing tag of the first <style> and the opening tag of the second one...

For purposes of this bug, btw, I'd like it if people would measure in builds with the patches from bug 705877 and its dependencies...

(dormant account)

Reporter

Comment 19

•

12 years ago

Since everything is driven by the refresh drivers sounds like what you are suggesting is suppressing restyles. Is there machinery for that?

Boris Zbarsky [:bzbarsky]

Comment 20

•

12 years ago

Not really, though we could tweak the refresh driver's timer delay computation or something.

But note that there is talk of using the same refresh driver for the chrome and the currently selected tab.

(dormant account)

Reporter

Comment 21

•

12 years ago

bz clarified this on irc. Foreground tabs are trying to paint at 60hz during pageload (which we agree is too aggressive in our current architecture.
background tab updates are throttled to "1000ms * 2^(number of firings since the tab got hidden)".

So the thing to investigate is how much time is spent on layout while loading in:
a) foreground tab
b) background tab

and whether that time is spent on
a) js
b) layouty stuff


For this specific bug, the next step is to compare huffpo load times in foreground vs background.

(dormant account)

Reporter

Updated

•

12 years ago

Whiteboard: [snappy:p2] → [snappy:p1]

Boris Zbarsky [:bzbarsky]

Comment 22

•

12 years ago

I just tried that, fwiw.  With the patches in bug 705877, I get load times of about 1700ms on my machine in foreground and maybe 1600ms in background.  With a lot of noise....

(dormant account)

Reporter

Comment 23

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #22)
> I just tried that, fwiw.  With the patches in bug 705877, I get load times
> of about 1700ms on my machine in foreground and maybe 1600ms in background. 
> With a lot of noise....

Can you post try builds so we can try this on a slower machine?

Boris Zbarsky [:bzbarsky]

Comment 24

•

12 years ago

There should be Linux and Windows PGO builds at https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bzbarsky@mozilla.com-56d56a5d8b2a/ (so far only the linux32 opt build is up).

There are non-PGO builds at https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/bzbarsky@mozilla.com-4508fa7cf0db/ which should be there for all platforms.

(dormant account)

Reporter

Comment 25

•

12 years ago

There is no difference between loading huffpo in fore/background tabs with your builds. I did notice that your builds load it in 8seconds vs 13/14 for my profiling nightly.

Lawrence Mandel [:lmandel] (use needinfo)

Comment 26

•

12 years ago

(In reply to Taras Glek (:taras) from comment #25)
> There is no difference between loading huffpo in fore/background tabs with
> your builds. I did notice that your builds load it in 8seconds vs 13/14 for
> my profiling nightly.

If I'm reading this correctly, the fix for bug 705877 has cut ~40% of the page load time in this case. What's the next step for this bug?

Boris Zbarsky [:bzbarsky]

Comment 27

•

12 years ago

Probably another profile (I'll try to do that tomorrow), but we should seriously think about other ways to approach the measurements because the profile will just show CPU time, and at this point I expect to have a large fraction of CPU idle time waiting on ... something.

Randell Jesup [:jesup] (needinfo me)

Comment 28

•

12 years ago

Well, a real-time profile will help, except it probably won't identify what's being waited on because it's not synchronous waits where the call stack would identify what the cause was.

Some type of waterfall display of the page load would help, especially if you could mix in events (restyles, reflows, paints, etc) and mark start/stop points of processing to help visualize the entire flow (data transfer and processing), and then let you click on any of them for details (in processing elements, the profile of just that part).

Love to work on it; no time though.

Boris Zbarsky [:bzbarsky]

Comment 29

•

12 years ago

Shark's ability to do real-time profiles got broken by Apple, sadly...

In any case, a profile of a _cached_ load shows almost no idle time, as far as I can tell.  Basic breakdown of user-space main-thread time (which is about 94% of the total main-thread time), with all percentages being percentages of the userspace main-thread stuff:

  3% Load starts and moving data around
  3% CSS parsing
  7% Compiling script to bytecode for <script> tags with src attributes
 10% executing script for those <script> tags (layout flushes, cookie setting, lots of
     actual JS execution, etc, etc).
  8% Compiling inline scripts to bytecode
 12% executing inline scripts (lots of mjit compilation and whatnot here, as well as all
     the things for external scripts)
  7% executing JS from setTimeout calls (more of the same as above)
 10% refresh ticks (restyles and layout, mostly, as well as CreateNeededFrames).
 13% painting
  2% layout flush before firing onload
  4% script running from the DOMContentLoaded event handler
 
and then a long tail of minor stuff...

(dormant account)

Reporter

Comment 30

•

12 years ago

Anthony, we need to rerun this test

Keywords: qawanted

u279076

Comment 31

•

12 years ago

(In reply to Taras Glek (:taras) from comment #30)
> Anthony, we need to rerun this test

What build?

(dormant account)

Reporter

Comment 32

•

12 years ago

(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #31)
> (In reply to Taras Glek (:taras) from comment #30)
> > Anthony, we need to rerun this test
> 
> What build?

Latest nightly

u279076

Comment 33

•

12 years ago

As instructed, I reran the tests with Firefox 15.0a1 2012-06-01. Results can be seen here:
https://docs.google.com/spreadsheet/ccc?key=0AmkRt0ylPb8zdGxPRDhUbGdoM3lwd3JQNXY3bW5nNHc&hl=en_US#gid=4

Keep in mind that there is probably a 0.5-1.0s due to manually start/stopping the timer between clicks. Assuming the same amount of error across all tests there are a couple of interesting musings.

* 6 months ago: 7-15s without Flash, 7-20s with Flash, average deviation of 5%
* 2012-06-01: 4-9s without Flash, 4-7s with Flash, average deviation of 0.5%

Again, not very scientific but interesting nonetheless.

Note, each testrun was done on a clean VM snapshot.
1. Start from baseline snapshot
2. Install Firefox Nightly
3. Run the tests
4. Revert snapshot
5. Install Firefox Nightly and Flash 11.3
6. Run the tests

u279076

Updated

•

12 years ago

Keywords: qawanted

Boris Zbarsky [:bzbarsky]

Comment 34

•

12 years ago

Taras, how do things look here on your end now?

BMO Automation

Updated

•

2 years ago

Severity: normal → S3