Closed Bug 1225037 Opened 9 years ago Closed 6 years ago

Firefox hang while loading web pages with deeply-nested inline-blocks

Categories

(Core :: Layout: Text and Fonts, defect)

x86_64
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: jacobly.alt, Assigned: dbaron)

References

()

Details

(Keywords: hang, platform-parity)

Attachments

(2 files, 1 obsolete file)

Opening this web page causes Firefox Nightly (also version 42) to hang and manually crashing produces this crash report:
https://crash-stats.mozilla.com/report/index/64b990dc-dba9-4d37-aa79-257602151116

I left firefox running for a minute on an idle system with a fresh profile.  The firefox process used less than 20% cpu and the Web Content process used more than 97% cpu the whole time.

Some of the logs (from http://chat.eeems.ca/) load fine but the others (possibly only the larger ones) always cause a hang that has never recovered.
I can reproduce the hang on Windows7

Stack bp-5e313ecb-0c6a-46b5-9cea-98f422151117
Severity: normal → critical
Status: UNCONFIRMED → NEW
Component: Untriaged → Layout
Ever confirmed: true
Keywords: hang
OS: Linux → All
Product: Firefox → Core
Attached file reduced html
Hmm.  The reduced testcase and the original page both work for me on Mac.  Something to do with the exact font metrics involved, or something specific to text layout?

A sample during the hang would be pretty useful....
Component: Layout → Layout: Text
Keywords: pp
(In reply to Boris Zbarsky [:bz] from comment #4)
> Hmm.  The reduced testcase and the original page both work for me on Mac. 
> Something to do with the exact font metrics involved, or something specific
> to text layout?
> 
> A sample during the hang would be pretty useful....


If you are using wide monitor and wide width browser, It will not happen.
Could you test with narrow width browser window(~800px)?
Flags: needinfo?(bzbarsky)
Ah, with a narrow width (my normal one is about 930px, fwiw) I can reproduce.  Looks like all the time is in reflow, so presumably we're ending up with exponential growth in number of leaf reflows here or something, given the structure of the testcase.

Daniel, do you have time to take a look?
Flags: needinfo?(bzbarsky) → needinfo?(dholbert)
It definitely looks like something quadratic is going on here, but I can't tell you much more than that yet. (Haven't looked in much detail.)  Leaving needinfo=me.

I attached gdb during the hang, and was in deeply-nested block layout. Typing "fin" repeatedly to make my way up the stack, I saw that each time I made it to a "ReflowDirtyLines" call, it took progressively longer to finish out of that function.

Here's a somewhat-simpler testcase which also reproduces this hang.
Flags: needinfo?(dholbert)
Summary: Firefox hang while loading certain web pages. → Firefox hang while loading web pages with deeply-nested inline-blocks
So, what happens here is something like this:

* We start out in ReflowDirtyLines, for particular block. This block has just one line, which has two frames:
  (1) a text frame (e.g. "111" in my reduced testcase)
  (2) an inline-block, containing the subtree of all our descendants.

* We reflow each frame on our line -- first the text frame, and then the inline-block (which means reflowing all its descendants).
* Then, we call CanPlaceFrame to see if the inline-block fits on our line.  We find that it does NOT (because we have limited available width, due to e.g. skinny browser-window.)
* So, we push the inline-block to a second line, and we reflow it (and all its descendants) again.

So at each level, we reflow our child inline-block twice (first as part of the same line as the text, and then as its own second line). And each inline-block does this twice for *its* child, each time it's reflowed, I think. So, this blows up to be an n^2 algorithm.
We wish it were n^2.  It's 2^n.
Sorry, I left one part out.

So, after we've split a particular inline-block into 2 lines, one might understandably think it'd be ~cheap to reflow from that point on, because now we've sorted things out and won't try putting the huge subtree in a spot where it doesn't fit anymore.

BUT, one would be wrong. Specifically -- the next time a frame is reflowed after we've split it into two lines, we optimistically try to pull the first part of the second line up to the first line, to see if it fits now.  So, effectively, we re-merge the two lines into one line, and reflow to see if things fit.  We then discover that it still doesn't fit, and then split into two lines again, and we reflow the subtree again.

So, at each level of reflow, we're reflowing the subtree below us twice.  And yeah, that's 2^n, not n^2 as bz points out. Ouch.

Relevant links:
 - Here's where DoReflowInlineFrames() optimistically (naively?) tries to pull the second line's contents (the subtree of descendant blocks) up onto to the first line, for a hopeful & doomed-to-fail reflow (which then gets followed by a re-splitting and a second reflow):
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsBlockFrame.cpp?rev=1ffc4b1f166c#3864

 - Here's where CanPlaceFrame() reports that the subtree didn't fit on that first line, and we need to (re-)split:
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsLineLayout.cpp?rev=fbd2e8ad5795#1426
Note: I can hack around this by setting the variable "allowPullUp" to false instead of true, in nsBlockFrame::ReflowInlineFrames(), here:
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsBlockFrame.cpp?rev=1ffc4b1f166c#3684

That prevents us from making the optimistic/doomed "PullFrame" call, and the hang goes away.

I don't think this would be a *correct* fix, since presumably we *do* need to allow pulling frames up in some cases (e.g. when something's been removed from the first line and more stuff will fit now).  But the fix probably would need to be along those lines...

(I don't know the inline-layout code well enough to have a clear idea for a fix, so I probably shouldn't dig too much further in at this point, unless this is a high-priority/frequently-hit issue.)
(Here's a more reliably-hanging version of reduced testcase 2, btw -- I've just added an explicit small width on the <body>, so that it'll reproduce regardless of your browser window size.)
Attachment #8688751 - Attachment is obsolete: true
I doubt that people will hit this all that often. I've given jacobly an alternative site that doesn't cause the problems hes encountered.
I can reproduce the hang with reduced testcase and testcase2 on Firefox55.
However, I can no longer reproduce the hang on Firefox56 x64 and later on Win10.
Fixed range:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=e6e712904806da25a9c8f48ea4533abe7c6ea8f4&tochange=d6bf703c5deaf1e328babd03d5e68ff2a4ffe10e

Fixed by: 
	1e3130e96f03	L. David Baron — Bug 1308876 - Mark child frames as dirty before starting reflow of the parent, so that if we reflow a child twice, it's only dirty the first time. r=dholbert
Depends on: 1308876
Yeah, that probably changes from an O(2^N) algorithm to an O(N) algorithm...
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Assignee: nobody → dbaron
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: