Nested inline-blocks with matching width locks up browser due to O(2^depth) reflow performance

RESOLVED FIXED in Firefox 56

Status

()

--
major
RESOLVED FIXED
3 years ago
2 months ago

People

(Reporter: jeantetg, Assigned: dbaron)

Tracking

(Depends on: 20 bugs, Blocks: 3 bugs, 5 keywords)

49 Branch
mozilla56
csectype-dos, hang, perf, site-compat, testcase
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox56 fixed)

Details

(Whiteboard: [sg:dos][qf:p1])

Attachments

(22 attachments, 5 obsolete attachments)

3.09 KB, text/html
Details
760 bytes, text/html
Details
910 bytes, text/html
Details
480 bytes, text/html
Details
506 bytes, text/html
Details
683 bytes, text/html
Details
784 bytes, application/xhtml+xml
Details
550 bytes, text/html
Details
584 bytes, application/xhtml+xml
Details
989 bytes, text/html
Details
5.83 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
3.20 KB, patch
mats
: review+
Details | Diff | Splinter Review
5.50 KB, patch
mats
: review+
Details | Diff | Splinter Review
2.57 KB, patch
mats
: review+
Details | Diff | Splinter Review
3.45 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
3.10 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
7.78 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
10.32 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
1.96 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
2.22 KB, patch
dholbert
: review+
Details | Diff | Splinter Review
814 bytes, text/html
Details
668 bytes, text/html
Details
(Reporter)

Description

3 years ago
Posted file crash.html
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:49.0) Gecko/20100101 Firefox/49.0
Build ID: 20160922113459

Steps to reproduce:

Hello.

First of all i'm sorry for my english, hope you'll understand the issue.

I encoutered a specific user case that make firefox cpu intensive and the rendering of all tabs (even new one) freeze. I invite you to check the attached file. This happen because i forgot to close a tag (</div>) in a loop, whitout -moz-column-width css directive browser work fine, with it just freeze.

That shouldn't happen because this code is incorrect however ie and chrome work juste fine so i'm afraid there is some kind of endless loop or memory leak or worse something that can affect security in firefox. I'm not qualified in these areas so i prefer to be carefull and hide this case from the public.


Actual results:

In my computer firefox seems entering and endless loop.


Expected results:

Just render the page.
Reproduced on Mac with Firefox 49 and Beta 50 (both e10s) and ESR-45.4

It's spinning the CPU but doesn't seem to be growing in memory use, at least not quickly. An Activity Monitor sample doesn't show it doing much (95% time unaccounted for despite high CPU use?) so maybe it's calling some slow OS operation in a loop.

Does not appear to be exploitable other than as a Denial of Service so I'll unhide this so more people can help.
Group: core-security
Severity: normal → major
Status: UNCONFIRMED → NEW
Component: Untriaged → Layout
Ever confirmed: true
Keywords: csectype-dos, hang, testcase
Whiteboard: [sg:dos]
Note that although e10s keeps the browser UI responsive, it doesn't appear possible to recover the hung/busy child process. Closing the tab leaves other tabs still broken. about:newtab seems to work (does it run in the parent process?) but you can't open anything from it. You can't even quit the browser: the UI goes away but the process stays running until you force kill it.
status-firefox49: --- → affected
status-firefox50: --- → affected
status-firefox-esr45: --- → affected
Summary: cpu intensive with a specific user case (endless loop?) → Nested divs w/matching column-width and width locks up browser
This issue seems to exist at least back to Firefox 5.
Reducing the number of nested div significantly reduces the time it needs. After removing 7 blocks, there would be a long hang, but would render eventually. It feels to me that it may trigger some exponential time issue...
Posted file simplified testcase
This is a simplified testcase which hangs ~10s on my machine.
dbaron, any suggestion on this?
Flags: needinfo?(dbaron)
Oh, and column-width doesn't actually involve this issue as shown in my simplified testcase. It is simply because of nested blocks triggering nested line break.
Summary: Nested divs w/matching column-width and width locks up browser → Nested divs w/matching width locks up browser
(Jet mentioned this bug to me -- I'll try to take a look in the near term, if dbaron doesn't get to it first.)
Flags: needinfo?(dholbert)
Keywords: perf
So this patch both:
 * explains what the problem is, and
 * fixes it.

But it's pretty risky, and it obviously breaks other things.  In fact, it breaks the display of the iframe with the browser in it inside of the Firefox UI, so you actually need the layout debugger to tell that the performance problem is fixed.

But it fixes something that I think is both historically a mistake, and actually may be likely to fix some other reflow performance problems.
Flags: needinfo?(dbaron)
Attachment #8840228 - Attachment description: child-frames-dirty-less → experimental patch that fixes problem, but needs a lot more work to be shippable
Blocks: 1341750
With a relatively straightforward XUL-related issue fixed (and the browser UI functioning again), most of the reftest failures in a complete reftest run look related to fragmentation.  It wouldn't surprise me if there's only one or two remaining things to be fixed for our test suite, actually.
Current patches are:
https://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/6fff16084c7b/child-frames-dirty-less
https://hg.mozilla.org/users/dbaron_mozilla.com/patches/raw-file/6fff16084c7b/crashtest-hang

I suspect most of the reftest failures I'm seeing are related to frames that are being pulled from a previous continuation during reflow.  Prior to the patch, they were being marked dirty; with the patch they are not.

However, conceptually, it seems to me that they shouldn't need to be marked dirty, and the failures might be signs of other problems handling dynamic changes during paginated (e.g., multi-column) reflow that are already present in cases where the parent isn't marked dirty.  So I'm inclined to try to figure out what specific things need to be marked dirty in order to fix these, since I think a general fix isn't the right thing.

(I was debugging layout/reftests/w3c-css/submitted/css21/pagination/moz-css21-float-page-break-inside-avoid-4.html, since it looked like one of the simpler failures.  I believe what fails is that the float gets reflowed partially on the first page, returns some sort of interesting reflow status, and then fails to be fully reflowed again on the second page.)
The way that sort of thing normally seems to work in multicol is that frames report a reflow status of NS_FRAME_REFLOW_NEXTINFLOW, and then https://searchfox.org/mozilla-central/rev/c9ccb8ec1f5acda78e6344ffb87aa3a409031e3d/layout/generic/nsColumnSetFrame.cpp#609 uses that to mark the next child as dirty.
Whiteboard: [sg:dos] → [sg:dos][qf:p1]
The problem here is that we're mispositioning the reference red box that goes behind the test, probably during the reflow we do when we decide there are no scrollbars needed.
(In reply to David Baron :dbaron: ⌚️UTC-7 from comment #16)
> Created attachment 8848199 [details]
> testcase simplified from layout/reftests/css-ruby/box-properties-2-ref.html

OK, the problem here is that ruby *text* frames are reflowed by the ruby *base* container.  So when (as a result of the main patch in this bug) we mark them dirty at the start of the ruby text container's reflow, they end up staying dirty, and then things marked dirty inside of them don't trigger new reflows since there are already dirty bits set.  I need to adjust the primary patch in this bug to account for this.

If the assertions in PresShell::VerifyHasDirtyRootAncestor had been enabled I probably would have had to spend a good bit less time figuring this out.
On nightly, both the test and the reference have the inner multicol having a height of 2 lines of text, with the:
  4 5
  6
in the first column.

The patch changes the behavior so the inner multicol instead has a height of 1 line of text, with the 6 pushed to the second column:
  4 5 6

It's not obvious to me which is right, but the patch does change behavior here, and make the test not match the reference.

(Neither nightly nor with-patch is changed by zooming.)
Summary: Nested divs w/matching width locks up browser → Nested inline-blocks with matching width locks up browser due to O(2^depth) reflow performance
Blocks: 827937
Flags: needinfo?(dholbert)
Comment on attachment 8878736 [details]
testcase simplified from layout/generic/crashtests/1015844-3.html

A slightly modified version of this is now bug 1374479.
Attachment #8878736 - Attachment is obsolete: true
This fixes (confirmed by testing locally) a regression in
layout/reftests/w3c-css/received/css-multicol-1/multicol-nested-margin-004.xht
resulting from the primary patch in this bug, which tends to make frames
dirty less often.  The problem with that test is that (at least in a
simplified form), in the final reflow of the inner ColumnSet in the
first column of the outer ColumnSet, the inner ColumnSet chooses not to
reflow its first column, thus leaving that first column having a height
that is too large for the inner ColumnSet to fit in the first column of
the outer ColumnSet, causing the entire inner ColumnSet (rather than
just part of it) to be pushed to the next column.

I believe this existing incremental reflow code just doesn't make sense.
The code I'm modifying dates back primarily to:
https://github.com/mozilla/gecko-dev/commit/c237520c89a51349aad773bd971dd93cd09f91b9 (October 2004, initial columns implementation)
https://github.com/mozilla/gecko-dev/commit/ee070ec95f3a057d4e0d3e8e00ac9f5808a38fed (March 2005)
https://github.com/mozilla/gecko-dev/commit/31e3540d1e31e96f3a89b8ea314f6f3ffcb81746 (November 2006)

The first thing that doesn't make sense is the condition modified at the
end of this patch:
  (!reflowNext && (skipIncremental || skipResizeBSizeShrink))
There's simply no reason that that || isn't required to be an &&, as far
as I can tell.  Even if we don't need to reflow due to any of the
standard incremental reflow conditions, we can need to reflow because
the block size is shrinking and the column no longer fits.

Note that things were already OK when we required reflow due to
NS_SUBTREE_DIRTY(this), because of the way shrinkingBSizeOnly was
initialized using !NS_SUBTREE_DIRTY(this), thus excluding such cases
from the optimization.

The rest of the patch falls out of turning the || into an && in an
efficient way (i.e., without the extra !NS_SUBTREE_DIRTY(this) test, and
avoiding doing extra tests that we know we're not going to need by
coalescing all the incremental reflow tests into a single variable).

I tested that this patch passes try on its own (on 64-bit Linux debug):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a279023fb7e8f3349d5ecbfb95807d6b097cdbcb

MozReview-Commit-ID: BD3ofmWN5Wl
Attachment #8880661 - Flags: review?(dholbert)
Both of the changes are needed to fix
layout/reftests/w3c-css/submitted/css21/pagination/moz-css21-float-page-break-inside-avoid-4.html
with the primary patch in bug 1308876.

That patch changes the transfer of NS_FRAME_IS_DIRTY from parent to
child so that it happens at the start of the parent's reflow rather than
later at the start of the child's reflow, which means that frames that
are pulled into a dirty frame during reflow are not marked dirty (and
thus forced to reflow all of their lines).  This means that the primary
patch in bug 1308876 introduces the possibility of non-dirty reflows
during printing, which means we exercise non-dirty relayout code in a
number of tests where we did not do so previously.

Note: This patch passes try on its own, on Linux64 debug.

Writing a simple test for this that fails without the primary patch in
bug 1308876 seems difficult.  ColumnSetFrame responds to
nsReflowStatus::NextInFlowNeedsReflow by marking the next-in-flow
*dirty* (which page frames don't), which makes it hard to test in
columns, at least without nesting.  (Colums probably shouldn't do that,
though, but that's a performance fix for another time.)

MozReview-Commit-ID: JZ3qWTSO2lX
Attachment #8880662 - Flags: review?(mats)
The primary patch in this bug causes fewer dirty reflows, which leads to lines
being out-of-date for the reason described in the comment.  This causes
incorrect layout of some references sections on wikipedia, for which a
simplified testcase is included.

This bug was not caught by anything in our test suite, but I noticed it
while browsing wikipedia (since I use a build that has my patches in it
for my regular browsing).

MozReview-Commit-ID: 4hTQpGS2pZH
Attachment #8880663 - Flags: review?(mats)
This fires, for example, in layout/base/crashtests/265973-1.html (and a
number of other tests, I believe) with the primary patch in this bug.
This is because the primary patch causes frames to be dirty less often,
which sends us into the PrepareResizeReflow codepath, which only happens
when frames are not dirty.

I don't think NS_CoordSaturatingAdd is needed here, since newAvailISize
is used only when deciding whether or not lines need reflow; wraparound
should only cause us to do a little extra work.

Note: This patch passes try on its own, on Linux64 debug.

MozReview-Commit-ID: K6Z5MvG7awp
Attachment #8880664 - Flags: review?(mats)
This fixes the regression in
layout/reftests/columns/column-balancing-nested-001.html from the
primary patch in this bug, for which a simplified testcase is
https://bugzilla.mozilla.org/attachment.cgi?id=8848293 .  I believe it's
simply a pre-existing bug that wasn't previously exposed.  I suspect
it may be possible to write a test that shows the bug prior to the
patch.  However, it's difficult, since it requires triggering height
changes in a multicol with an 'auto' height (since a non-'auto' height
would cause the multicol to have NS_FRAME_CONTAINS_RELATIVE_BSIZE set by
nsColumnSetFrame::Reflow, which would make skipIncremental false because
ShouldReflowAllKids returns true).  I suspect any working testcase would
likely be rather brittle, so I haven't pursued it further (particularly
given the complexity of the minimal testcase).

MozReview-Commit-ID: Gve3XKEPSxL
Attachment #8880665 - Flags: review?(dholbert)
I noticed this while debugging the assertion count failure of
layout/base/crashtests/470851-1.xhtml .  It doesn't help that failure,
but it still seems like the safe thing to do.

MozReview-Commit-ID: 6xHxUJCjUHh
Attachment #8880666 - Flags: review?(dholbert)
Previously, in paginated mode, all reflows were dirty reflows, since
tables do not split outside of printing, and prior to the primary patch
in bug 1308876, all reflows during printing are dirty reflows.  (The
isPaginated test here is actually for real pages, not fragmentation in
general.  However, the use here is appropriate for the meaning of
whether it's possible for the table to fragment.)

The fact that all reflows were dirty reflows meant that the
NS_FRAME_CONTAINS_RELATIVE_BSIZE flag was always cleared immediately
before reflow in ReflowInput::InitResizeFlags (which might also have set
the flag on *ancestors*).  This meant that, prior to the primary patch
in bug 1308876, the initial value of needToInitiateSpecialReflow that
was initialized from the presence of the
NS_FRAME_CONTAINS_RELATIVE_BSIZE flag was always false.  This patch
preserves that initialization in the presence of the change in the
primary patch in bug 1308876.

This caused a failure in a single test in our test suite, and in a
rather complicated way.  The test was
layout/base/crashtests/470851-1.xhtml, in which there was both a
difference in assertion count (due to the bogus assertion "data loss -
incomplete row needed more height than available, on top of page" in
nsTableRowGroupFrame::SplitRowGroup, whose companion assertion "data
loss - complete row needed more height than available, on top of page"
is already just an NS_WARNING) that caused a test failure, and a
difference in layout (the test split across 3 pages rather than 2) that
did not cause a test failure.

This patch fixes the difference in layout.  The immediate cause of the
layout difference was that a cell (the second outermost) on the second
page had a height, computed in CalcUnpaginatedBSize, that was large
enough to cause it to need to continue onto the third page.  This height
came (via nsTableRowFrame::GetUnpaginatedBSize) from the
UnpaginatedHeightProperty stored on the first-in-flow of its row, on the
first page, stored by CacheRowBSizesForPrinting called in
nsTableRowGroupFrame::ReflowChildren during the reflow of its row group
on the first page, in a special height reflow initiated during the
second-pass constrained-height reflow of the table (still,
second-outermost) on the first page, due to the change being fixed in
this patch.

MozReview-Commit-ID: 3E84VwdXuPs
Attachment #8880667 - Flags: review?(dholbert)
This is the primary patch in this bug, and makes the performance
improvement that fixes this bug.

The assertion count increase for layout/generic/crashtests/1015844.html
is accompanied by a layout change in the testcase as well.  However, I'm
not planning to fix it in this sequence; fundamentally columnsets with
specified heights inside a paginated context (like another columnset) do
not work in any reasonable way, and changing the number of times we
reflow them can change the layout.  At least, assuming I didn't lose
something in the process of simplifying the testcase.

TODO: File followup on making this faster by somehow not worsening
memory locality.

ISSUES:
 - may make block inside XUL worse in performance by marking dirty more (see subdoc in Firefox UI, or text control innards?)

MozReview-Commit-ID: GdOvPynqcFP
Attachment #8880668 - Flags: review?(dholbert)
In the existing code, the parent having NS_FRAME_IS_DIRTY is not
propagated to column groups because nsTableFrame::ReflowColGroups checks
the child dirty bit before constructing the reflow state for the child.
This preserves that behavior in the presence of the primary patch in bug
1308876.

I noticed this while debugging the assertion count failure of
layout/base/crashtests/470851-1.xhtml .  It doesn't help that failure,
but it still seems like the safe thing to do.

MozReview-Commit-ID: EhfIQQkeaJx
Attachment #8880669 - Flags: review?(dholbert)
Comment on attachment 8880662 [details] [diff] [review]
Mark lines dirty when we abort their reflow due to page-break-inside:avoid

Makes sense.  r=mats
Attachment #8880662 - Flags: review?(mats) → review+
Attachment #8880663 - Flags: review?(mats) → review+
Attachment #8880664 - Flags: review?(mats) → review+
Comment on attachment 8880661 [details] [diff] [review]
Fix ColumnSet to reflow a non-dirty column when the block-size has shrunk and the column might need to push some children to the next column

Review of attachment 8880661 [details] [diff] [review]:
-----------------------------------------------------------------

r=me

::: layout/generic/nsColumnSetFrame.cpp
@@ -673,5 @@
>      // (It may also have overflowing content that doesn't care about the available height
>      // boundary, but if so, too bad, this optimization is defeated.)
>      // We want scrollable overflow here since this is a calculation that
>      // affects layout.
> -    bool skipResizeBSizeShrink = false;

There's one remaining usage of this variable, inside of an #ifdef DEBUG_roc statement (as part of a debugging printf).  Might as well remove that usage from the printf, to completely kill off this variable.
Attachment #8880661 - Flags: review?(dholbert) → review+
Comment on attachment 8880665 [details] [diff] [review]
Reflow all kids when column-fill is auto and height has changed

Review of attachment 8880665 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsColumnSetFrame.cpp
@@ +666,5 @@
>        && child->GetNextSibling()
>        && !(aUnboundedLastColumn && columnCount == aConfig.mBalanceColCount - 1)
>        && !NS_SUBTREE_DIRTY(child->GetNextSibling());
> +    // If column-fill is auto (not the default), then we might need to
> +    // move content between columns for any change in block-size.

maybe, for extra clarity:
  s/in block size/in column block-size/
Attachment #8880665 - Flags: review?(dholbert) → review+
Attachment #8880666 - Flags: review?(dholbert) → review+
Comment on attachment 8880667 [details] [diff] [review]
Avoid initiating special-height reflow as a result of new paginated non-dirty reflows

Review of attachment 8880667 [details] [diff] [review]:
-----------------------------------------------------------------

r=me
Attachment #8880667 - Flags: review?(dholbert) → review+
Comment on attachment 8880668 [details] [diff] [review]
Mark child frames as dirty before starting reflow of the parent, so that if we reflow a child twice, it's only dirty the first time

Review of attachment 8880668 [details] [diff] [review]:
-----------------------------------------------------------------

The commit message has:
> TODO: File followup on making this faster by somehow not worsening
> memory locality.

Probably worth filing that followup (and mentioning the bug number in the related new code-comment in ReflowInput.cpp), before landing this?  (I'm guessing that was your intent, from having this TODO in the commit message.)

::: layout/generic/ReflowInput.cpp
@@ +371,5 @@
> +    for (nsIFrame::ChildListIterator childLists(mFrame);
> +         !childLists.IsDone(); childLists.Next()) {
> +      for (nsFrameList::Enumerator childFrames(childLists.CurrentList());
> +           !childFrames.AtEnd(); childFrames.Next()) {
> +        childFrames.get()->AddStateBits(NS_FRAME_IS_DIRTY);

You can use a range-based "for" loop to condense the inner loop like so:

  for (nsIFrame* childFrame : childLists.CurrentList()) {
    childFrame->AddStateBits(NS_FRAME_IS_DIRTY);
  }

::: layout/generic/nsColumnSetFrame.cpp
@@ +468,5 @@
> +MarkPrincipalChildrenDirty(nsIFrame* aFrame)
> +{
> +  for (nsFrameList::Enumerator childFrames(aFrame->PrincipalChildList());
> +       !childFrames.AtEnd(); childFrames.Next()) {
> +    childFrames.get()->AddStateBits(NS_FRAME_IS_DIRTY);

This can be shortened to:
  for (nsIFrame* childFrame : aFrame->PrincipalChildList()) {
    childFrame->AddStateBits(NS_FRAME_IS_DIRTY);
  }

::: layout/generic/nsRubyFrame.cpp
@@ +129,5 @@
> +  // ruby text container doesn't mark the ruby text frames dirty *after*
> +  // they're reflowed and leave dirty bits in a clean tree (suppressing
> +  // future reflows, due to lack of a queued reflow to clean them).
> +  for (nsIFrame* child : PrincipalChildList()) {
> +    if (child->GetStateBits() & NS_FRAME_IS_DIRTY &&

If you like, this is equivalent & might be a bit more readable:
  if (child->HasAnyStateBits(NS_FRAME_IS_DIRTY) &&

(Personally, I get slightly tripped up by the ampersands in the current "if (foo & bar &&" version, and I find HasAnyStateBits more foolproof than manual bitwise arithmetic with state bits.)
Attachment #8880668 - Flags: review?(dholbert) → review+
Comment on attachment 8880669 [details] [diff] [review]
Preserve behavior of ignoring parent dirty bit for column groups

Review of attachment 8880669 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/ReflowInput.cpp
@@ +374,5 @@
>             !childFrames.AtEnd(); childFrames.Next()) {
> +        nsIFrame* child = childFrames.get();
> +        if (!child->IsTableColGroupFrame()) {
> +          child->AddStateBits(NS_FRAME_IS_DIRTY);
> +        }

(This may need rebasing if you end up condensing the inner for loop hree, per my feedback on this chunk of the previous patch.)
Attachment #8880669 - Flags: review?(dholbert) → review+
Comment on attachment 8880671 [details] [diff] [review]
Add crashtest that used to hang Firefox

Review of attachment 8880671 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/crashtests/1308876-1.html
@@ +1,2 @@
> +<!DOCTYPE html>
> +<html> 

Nit: EOL whitespace here - might as well get rid of it.
Attachment #8880671 - Flags: review?(dholbert) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/cfc48662b2cc30a09bd18e7d19a8559f4ba182ba
Bug 1308876 - Mark lines dirty when we abort their reflow due to page-break-inside:avoid.  r=mats

https://hg.mozilla.org/integration/mozilla-inbound/rev/0b20a0fd86ff8718fe9461016ac7a5279af20027
Bug 1308876 - Don't continue reflow after deciding we need to try again due to page-break-inside:avoid.  r=mats

https://hg.mozilla.org/integration/mozilla-inbound/rev/07def0eabf91096eb0edb7a1e76e5b213fd158d3
Bug 1308876 - Remove assertion that starts firing more when we mark frames dirty less and thus call PrepareResizeReflow more.  r=mats

https://hg.mozilla.org/integration/mozilla-inbound/rev/4b74f07a39bbb11f4c139363164f3d4e22693a58
Bug 1308876 - Fix ColumnSet to reflow a non-dirty column when the block-size has shrunk and the column might need to push some children to the next column.  r=dholbert

https://hg.mozilla.org/integration/mozilla-inbound/rev/67c036d54d3eec8dcf4c8deee44695c2564180c4
Bug 1308876 - Reflow all kids when column-fill is auto and height has changed.  r=dholbert

https://hg.mozilla.org/integration/mozilla-inbound/rev/c89b504be5d71ce88c32a1ff299424876e28c40b
Bug 1308876 - Prevent tables from trying to do incremental reflow when fragmented, since they can't.  r=dholbert

https://hg.mozilla.org/integration/mozilla-inbound/rev/d2fc16d299606d8f78de50c3f24633cb8e53d5f6
Bug 1308876 - Avoid initiating special-height reflow as a result of new paginated non-dirty reflows.  r=dholbert

https://hg.mozilla.org/integration/mozilla-inbound/rev/b6f51232167c4476a877ed45ed849c62e93005fa
Bug 1308876 - Mark child frames as dirty before starting reflow of the parent, so that if we reflow a child twice, it's only dirty the first time.  r=dholbert

https://hg.mozilla.org/integration/mozilla-inbound/rev/14f1496b0b4d456f2c1373aaae222c4343634190
Bug 1308876 - Preserve behavior of ignoring parent dirty bit for column groups.  r=dholbert

https://hg.mozilla.org/integration/mozilla-inbound/rev/b2cdc72e57b8d45f63dfb7333a7b23adc4354af9
Bug 1308876 - Add crashtest that used to hang Firefox.  r=dholbert
Backed out for reftest failures, at least on Windows 7, e.g. layout/reftests/text-overflow/xulscroll.html:

https://hg.mozilla.org/integration/mozilla-inbound/rev/201b30adaf89493e1fdd200a4ce9172bb905a33d
https://hg.mozilla.org/integration/mozilla-inbound/rev/a4223b9ef404f56c25387a222ec0e6aab391fd00
https://hg.mozilla.org/integration/mozilla-inbound/rev/066424e1eda7fe5d96501129666e3107d0bd1416
https://hg.mozilla.org/integration/mozilla-inbound/rev/4776aab7f2a0eac146cb5dfd63b551833d625069
https://hg.mozilla.org/integration/mozilla-inbound/rev/593419bb3a541f3a31fd67e0a40c78744bf3cec6
https://hg.mozilla.org/integration/mozilla-inbound/rev/8680875aed18707c2220f0b9326c10f60466b199
https://hg.mozilla.org/integration/mozilla-inbound/rev/c46619040aacd36a123840ebd28e91592d39d630
https://hg.mozilla.org/integration/mozilla-inbound/rev/39524720b249922ac73005b1d8e96bf547458d88
https://hg.mozilla.org/integration/mozilla-inbound/rev/de1f4277bea252407bba8b379c89ca9271c6966f
https://hg.mozilla.org/integration/mozilla-inbound/rev/2481971ba300c559c9639da1556d25b588fdad4d

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=b2cdc72e57b8d45f63dfb7333a7b23adc4354af9&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=110206020&repo=mozilla-inbound

> REFTEST TEST-UNEXPECTED-FAIL | file:///Z:/task_1498607645/build/tests/reftest/tests/layout/reftests/text-overflow/xulscroll.html == http://localhost:49243/1498608091790/202/text-overflow/xulscroll-ref.html | image comparison, max difference: 255, number of differing pixels: 394
Flags: needinfo?(dbaron)
when this landed we saw a performance improvment:
== Change summary for alert #7558 (as of June 27 2017 23:19 UTC) ==

Improvements:

  4%  quantum_pageload_amazon summary windows7-32 opt e10s     3,142.00 -> 3,021.21

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7558


and when we backed out, we saw a regression:
== Change summary for alert #7576 (as of June 28 2017 01:00 UTC) ==

Regressions:

  4%  quantum_pageload_amazon summary windows7-32 opt e10s     3,007.67 -> 3,131.75

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7576



In this test we have a recent (last month) copy of amazon.com page that we replay via mitmproxy (for https) and measure the onload event.

Looking forward to this landing and improving our pageload time.
(In reply to David Baron :dbaron: ⌚️UTC-7 from comment #55)
> Created attachment 8885945 [details]
> variant of layout/reftests/bugs/371561-1.html that shows Android reftest
> failure on all platforms
> 
> ...by adding 'overflow:hidden' on the root element.

I think this might be bug 667079.
https://hg.mozilla.org/integration/mozilla-inbound/rev/e3741d5b60f182cddec58733e2d17cdf4dfe586b
Bug 1308876 - Mark lines dirty when we abort their reflow due to page-break-inside:avoid.  r=mats

https://hg.mozilla.org/integration/mozilla-inbound/rev/ef306e4124f8b66fe87524159954ddce301e4cf5
Bug 1308876 - Don't continue reflow after deciding we need to try again due to page-break-inside:avoid.  r=mats

https://hg.mozilla.org/integration/mozilla-inbound/rev/28292dffc5d03ca35993cc3c64b7505c9f87061c
Bug 1308876 - Remove assertion that starts firing more when we mark frames dirty less and thus call PrepareResizeReflow more.  r=mats

https://hg.mozilla.org/integration/mozilla-inbound/rev/66edf6c444e7c88572d1dc58aac6ee6788ffb03f
Bug 1308876 - Fix ColumnSet to reflow a non-dirty column when the block-size has shrunk and the column might need to push some children to the next column.  r=dholbert

https://hg.mozilla.org/integration/mozilla-inbound/rev/6dd31665f62e98246873c409ed1b006f511e57a2
Bug 1308876 - Reflow all kids when column-fill is auto and height has changed.  r=dholbert

https://hg.mozilla.org/integration/mozilla-inbound/rev/98af0eaefed7239a427cbd12ef42b2d8f0e50269
Bug 1308876 - Prevent tables from trying to do incremental reflow when fragmented, since they can't.  r=dholbert

https://hg.mozilla.org/integration/mozilla-inbound/rev/395b6c53e42b064c4463d5156df5baa23d98dc5d
Bug 1308876 - Avoid initiating special-height reflow as a result of new paginated non-dirty reflows.  r=dholbert

https://hg.mozilla.org/integration/mozilla-inbound/rev/1e3130e96f03b7e3bff220e05c5e06b54f0c285f
Bug 1308876 - Mark child frames as dirty before starting reflow of the parent, so that if we reflow a child twice, it's only dirty the first time.  r=dholbert

https://hg.mozilla.org/integration/mozilla-inbound/rev/7d4d90362c1f80efff9e674f0d9962d1aa13ef37
Bug 1308876 - Preserve behavior of ignoring parent dirty bit for column groups.  r=dholbert

https://hg.mozilla.org/integration/mozilla-inbound/rev/d6bf703c5deaf1e328babd03d5e68ff2a4ffe10e
Bug 1308876 - Add crashtest that used to hang Firefox.  r=dholbert
and after landing we see the pageload time again!

== Change summary for alert #7910 (as of July 13 2017 02:38 UTC) ==

Improvements:

  5%  quantum_pageload_amazon summary windows7-32 opt e10s     3,163.44 -> 3,014.67

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7910


thanks for the work
we also see improvements on android (real phones):
== Change summary for alert #7890 (as of July 13 2017 02:38 UTC) ==

Improvements:

  7%  remote-nytimes summary android-4-4-armv7-api15 opt      2,299.63 -> 2,131.05
  4%  remote-nytimes summary android-7-1-armv8-api15 opt      1,566.79 -> 1,497.16
  4%  remote-nytimes summary android-4-2-armv7-api15 opt      3,920.74 -> 3,766.30
  4%  remote-tp4m summary android-6-0-armv8-api15 opt         222.85 -> 214.35

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7890
Perhaps it's worth a drop more explanation what the performance improvement here is.  The performance improvement comes from the primary patch in this bug, https://hg.mozilla.org/mozilla-central/rev/1e3130e96f03 .

The underlying problem (dating back to the reflow branch, bug 300030) relates to the flags that frames use to track whether they need to redo layout (i.e., whether they are "dirty").  (This is separate from how we track dirtiness for intrinsic sizes.)  We use a pair of flags:

  NS_FRAME_IS_DIRTY says that this frame and all of its descendants need to be reflowed

  NS_FRAME_HAS_DIRTY_CHILDREN says that this frame needs to be reflowed because either (a) one of its children has one of the two flags set or (b) one of its children was removed.

Processing NS_FRAME_HAS_DIRTY_CHILDREN require that the process of doing layout iterate all the children, but the children can potentially decide to optimize away going through their children.  On the other hand, processing NS_FRAME_IS_DIRTY requires that we propagate the NS_FRAME_IS_DIRTY flag to the children so that they, in turn, reflow all of *their* children, etc.

The problem is that sometimes the process of doing reflow requires that a frame reflow some or all of its children more than once.  For example, it might do a layout at one size, and based on the result, decide that the children need different sizes for some reason.

With the code prior to this patch, the NS_FRAME_IS_DIRTY bit was propagated to the children for all reflows.  However, we only really need to propagate it to the children once; the second reflow is only a resize (or repositioning and refragmenting) and can be optimized away as the frame class sees fit.  The caller that set NS_FRAME_IS_DIRTY only required that all frames in the subtree be reflowed once because something had changed that required their reflow.

So this patch speeds up cases where frames reflow their children multiple times.  This is actually quite common.  For example, a page that is too short to require a scrollbar, and doesn't have overflow:hidden on the root or the body, will reflow its contents first assuming that there will be a scrollbar, and then again once the width of the scrollbar is removed and there's more room for them.  This patch causes the second reflow to always be treated as a pure resize (with significant optimization) rather than being a full everything-dirty reflow of the subtree whenever the first reflow was.

In certain pathological cases where there's a deep frame tree where *each* frame reflows its child twice, the original problem can lead to O(2^depth) performance, as in the original testcase in this bug.  That's pretty rare.  But it's much more common to have this occur in a few places.

There is a tradeoff, though.  Propagating NS_FRAME_IS_DIRTY to children at the beginning of the parent's reflow (rather than the beginning of the child's reflow) means we're doing an extra pass over the frame list, which probably means using more memory bandwidth.  But it seems to be worth it in order to optimize away the extra reflows.
(Reporter)

Comment 63

2 years ago
Well when i opened this bug i couldn't imagine that it'll endup with such performances improvement. Seems to me that is an outstanding technical issue, so maybe that is not the place for doing that but i just wanted to thank you all at mozilla for the amazing work and for building better softwares every day.

Updated

2 years ago
Depends on: 1406291

Updated

2 years ago
Depends on: 1406356

Updated

2 years ago
Depends on: 1406050

Updated

2 years ago
Depends on: 1406163

Updated

2 years ago
Depends on: 1406619
Depends on: 1393841
No longer blocks: 1412687
Depends on: 1412687
No longer blocks: 1409585
Depends on: 1409585
Depends on: 1428670

Updated

a year ago
Depends on: 1439392

Updated

a year ago
Blocks: 1225037

Updated

a year ago
Blocks: 1003427

Updated

10 months ago
Depends on: 1468072
A new regression appeared since this bug was fixed.
See bug 1468072 for more info.
Flags: needinfo?(dbaron)

Updated

9 months ago
Depends on: 1468654
Depends on: 1470436
Adding dependency on bug 1459937, so we can monitor if it helps with other regressions here.

Note that that bug alone probably won't be enough, but is a first step in fixing a category of bugs, where:
Child frames are pulled from a later frame to an earlier, currently being reflowed, frame; and these children haven't been marked dirty yet (as they would have been if they were already children of the earlier frame when its ReflowInput was first initialized and marked the current children dirty.)
This issue may still be present elsewhere, so all code that pulls "later" frames should be audited.
Depends on: 1474771
Depends on: 1483659
(In reply to Gerald Squelart [:gerald] from comment #67)
> Adding dependency on bug 1459937, so we can monitor if it helps with other
> regressions here.

Unfortunately didn't help with bug 1428670.

Updated

7 months ago
Depends on: 1380830
status-firefox49: affected → ---
status-firefox50: affected → ---
status-firefox-esr45: affected → ---
Depends on: 1517446
You need to log in before you can comment on or make changes to this bug.