Open Bug 827937 Opened 8 years ago Updated 14 hours ago

Opening huge mercurial long commit page freezing Firefox

Categories

(Core :: Layout, defect)

x86_64
Windows 8
defect
Not set
normal

Tracking

()

People

(Reporter: romaxa, Unassigned, NeedInfo)

References

(Blocks 1 open bug, )

Details

(Keywords: perf, Whiteboard: [qf:p3])

Opening URL cause Firefox UI freeze (App not responding) for a while and no way to get it quickly back in normal state except restarting Firefox.

Works fine in Google Chrome.
Keywords: perf
We also use a gig of memory for that page.
Memory it is a different problem unless it is causing UI freeze by doing heavy allocations in some loop...
I have 8Gigs of Free Ram on desktop...
(In reply to comment #2)
> Memory it is a different problem unless it is causing UI freeze by doing heavy
> allocations in some loop...
> I have 8Gigs of Free Ram on desktop...

On low memory machines that can cause the OS to crash, which doesn't help for page loading speed.  :-)
We have existing bugs on memory use for pages like this.  Is the memory use noticeably worse than for other UAs?  Note that in this case the raw HTML is about 70MB and parsing it produces a tad over 2 million nodes, so a gig of memory usage once you include layout data and such sounds about right...  Just the DOM here will use 400-500MB.

In Chrome, I understand that the UI stays responsive, but what about the page itself?
Chrome UI and page content stays responsive all the time, link howering, clicking works during and after page load... page load itself takes ~15-30 seconds... and use around 900Mb for loading that page.
Duplicate of this bug: 1305726
I've managed to reproduce this issue with the latest Firefox release(49.0.2) and the latest Nightly on Windows 10 x64.
Here is also the cleopatra profile for Nightly(52.0a1-20161025030205):
https://cleopatra.io/#report=b3833342dd249b822d899cb1ee317a57057d683e

On the latest Firefox release and latest Nightly with e10s disabled, I haven't managed to get a cleopatra profile due to the fact that the browser has crashed.

Firefox 52.0a1 Crash Report:
https://crash-stats.mozilla.com/report/index/a51a3907-d7d8-4c34-90bf-da1ee2161026
Firefox 49.0.2 Crash Report:
https://crash-stats.mozilla.com/report/index/1fce2efd-5795-4ad7-b85e-579c62161026
Duplicate of this bug: 1312151
I have the same problem with opening, i.e., https://pypi.python.org/simple/
FF 50.0.2 Linux
One other page that kills my Firefox
https://hg.mozilla.org/mozilla-central/rev/2301f25d1595

Can we at least update the severity of this bug? We claim that e10s should be able to deal with this kind of situation, but the reality is that all tabs become unresponsive, and I have to manually kill the content process in task manager to have back my browser.
Here's a profile of the page posted at bug 1333918: https://clptr.io/2k1Y1go

Seems like most of the time in that page is spent doing layout. From looking at the profile, seems like the float placement algorithm has pretty bad behavior. We also spend a fair amount of time doing text shaping, but not even close.

I remember hearing WebKit used some fancy algorithm to position floats (I think Gecko's is quadratic). That may be affecting here.
Duplicate of this bug: 1333918
With e10s-multi, it doesn't freeze Firefox anymore for me. Closing the "frozen" tab is pretty slow though.
This happened to me today with searchfox, opening debugger.js (warning: http://searchfox.org/mozilla-central/source/devtools/client/debugger/new/debugger.js#71884).

I have a profile, and 90% of the time is spent under nsFloatManager::GetFlowArea. The reason is that in searchfox all lines of code are siblings floating to the left (for some reason), and that GetFlowArea is... far from linear (we go through all the previows floats to compute the flow position of one of them).

I don't have an immediate solution to this though, I might look at it when I have the time.
cc TYLin, since he's working on float-related stuff recently, so he may have some thought about how that may be optimized.
It seems to me the searchfox case is probably different from the mercurial page case.

The mercurial page case spends time mostly on reflowing text, which is probably an issue related to unnecessary invalidation. And according to comment 14, searchfox case seems to be actually related to some float algorithm...
comment 14 belongs on bug 1349676, not on this bug.
Flags: needinfo?(jfkthame)
I did a rough comparison of layout performance between Nightly and Chrome Canary
on a locally saved copy of http://hg.mozilla.org/mozilla-central/rev/a16372ce30b5
and it seems to me that we are actually *a lot faster* than Chrome.

Firefox Nightly v55 on Linux64:
Time to I see something painted: 0 - 5 sec, but in some cases very much longer
Time to it reacting to scroll request (scroll wheel): it moves the scroll position pretty fast but only shows a blank space, painting content comes much later
Time to fully reflowed and I can scroll to the end: 45s - 1m20s

Chrome Canary 59.0.3047.0 dev (64-bit):
Time to I see something painted: 0 sec
Time to it reacting to scroll request (scroll wheel): 0 sec, it responds very fast to scroll wheel request in the beginning, but seems to get progressively worse as the page loads, but in general much better than Firefox
Time to fully reflowed and I can scroll to the end: 5m50s - 6m20s !!!

My thoughts: our layout of this page is much much faster than Chrome's!  However, they somehow give a better impression anyway since they respond to scrolling so I can read the first few pages without any problem while the reminder of the page is processed.  I think we need to prioritize the initial paint better, and subsequent paints too in response to scroll commands.  We need to interrupt the reflow/parser/whatever and instead paint what we have.

It would still be good to profile the layout of this page just to see if we have some functions we can optimize to make our reflow even faster (I'll leave this to :jfkthame), but slow reflow is definitely NOT the problem on this page AFAICT.  It seems to be more of a "paint scheduling" problem that makes us *look* slower than Chrome when in fact we're not.

I only tested 64-bit Linux builds, the results might be different on other platforms.
> I think we need to prioritize the initial paint better

I expect the problem is largely how long the HTML parser runs before actually creating any DOM nodes and inserting them into the DOM.
Bug 1350770 was partially contributing to this time. Things may be slightly better now, but there's still a lot of reflow happening.
Depends on: 1350770
It may be worth seeing whether bug 1308876 helps.
Whiteboard: [qf]
(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #19)
> > I think we need to prioritize the initial paint better
> 
> I expect the problem is largely how long the HTML parser runs before
> actually creating any DOM nodes and inserting them into the DOM.

That could very well be.  We could spin that part off into a different bug once we have confirmed that theory.
Whiteboard: [qf] → [qf:p1]
(In reply to David Baron :dbaron: ⌚️UTC+8 from comment #21)
> It may be worth seeing whether bug 1308876 helps.

Marking the dependency. How does it look with your local patches?
Depends on: 1308876
Flags: needinfo?(dbaron)
(In reply to Jet Villegas (:jet) from comment #23)
> (In reply to David Baron :dbaron: ⌚️UTC+8 from comment #21)
> > It may be worth seeing whether bug 1308876 helps.
> 
> Marking the dependency. How does it look with your local patches?

FWIW, I just tested a local opt build with the "experimental patch" posted on that bug, and it didn't seem to help.  (My opt build takes ~100 seconds to load the implicated hg.m.o page here, regardless of whether the patch is applied.)

I'll leave needinfo=dbaron open in case his local patches are more likely to help, though.
No longer depends on: 1308876
Flags: needinfo?(dbaron)
(In reply to Mats Palmgren (:mats) from comment #18)
> I did a rough comparison of layout performance between Nightly and Chrome
> Canary
> on a locally saved copy of
> http://hg.mozilla.org/mozilla-central/rev/a16372ce30b5
> and it seems to me that we are actually *a lot faster* than Chrome.

It might also be useful to get a comparison of synchronous reflow of the entire page, since a bunch of the performance difference is related to the way we do incremental reflow during addition of content.

But we should also figure out why we're getting O(N^2) performance as we're appending more content.
I'll take a look this week and get an up-to-date profile of what's going on here.
Flags: needinfo?(dholbert)
(In reply to Daniel Holbert [:dholbert] (reduced availability - travel & post-PTO backlog) from comment #26)
> I'll take a look this week and get an up-to-date profile of what's going on
> here.

I had high hopes, but I didn't end up getting around to this during this perf work week after all.  (I did spend some time profiling a few seconds of this pageload here & there, but the load is long enough that it has hard to glean anything useful besides "we're reflowing a lot".)

I'm going to suggest that other perf bugs are higher priority than this one, so I'm going to focus on other bugs for the time being and cancel my needinfo here, and I'm going to call this [qf:p3]. In particular, I think this is not-top-priority because:

(1) This has gotten a lot better: in the time since this bug was filed, we've shipped e10s, which means our UI remains responsive (and you can close the tab to regain control of your session; and you can interact with other tabs while this page is loading in the background, even with dom.ipc.processCount.web = 1)

(2) mstange tells me some async scrolling stuff will improve things here soon (for responsiveness to certain user scroll actions during janky pageload), so this will get even better soon, at least for scrollability.

(3) We're not significantly worse than the competition -- Chrome is also pretty-bad here (though somewhat differently bad), per comment 18.

(4) This is not an especially representative testcase - it's gotta be pretty uncommon that users encounter pageloads as gigantic as this one.

I'll leave jfkthame's needinfo open in case he's got cycles & interest to take a look here, though. :)
Flags: needinfo?(dholbert)
Whiteboard: [qf:p1] → [qf:p3]
You need to log in before you can comment on or make changes to this bug.