Last Comment Bug 502017 - Crash [@ nsHTMLReflowState::GetHypotheticalBoxContainer] with -moz-column, overflow, clear, margin, fixed pos, and float
: Crash [@ nsHTMLReflowState::GetHypotheticalBoxContainer] with -moz-column, ov...
Status: RESOLVED FIXED
[sg:critical?][fixed by bug 67752 on ...
: crash, testcase, verified1.9.0.14, verified1.9.1
Product: Core
Classification: Components
Component: Layout (show other bugs)
: 1.9.1 Branch
: x86 All
: -- normal (vote)
: ---
Assigned To: Daniel Holbert [:dholbert] (largely AFK until June 28)
:
Mentors:
Depends on: ireflow 467493
Blocks:
  Show dependency treegraph
 
Reported: 2009-07-02 11:06 PDT by Daniel Holbert [:dholbert] (largely AFK until June 28)
Modified: 2011-06-13 10:01 PDT (History)
9 users (show)
dveditz: wanted1.9.1.x+
dveditz: blocking1.9.0.14+
dveditz: wanted1.9.0.x+
dholbert: in‑testsuite?
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
.2+
.2-fixed


Attachments
testcase 1 (crashes on loading in 1.9.1 & 1.9.0) (472 bytes, text/html)
2009-07-02 11:06 PDT, Daniel Holbert [:dholbert] (largely AFK until June 28)
no flags Details
backtrace at crash (3.56 KB, text/plain)
2009-07-02 11:10 PDT, Daniel Holbert [:dholbert] (largely AFK until June 28)
no flags Details
frametree at crash (5.70 KB, text/plain)
2009-07-02 11:12 PDT, Daniel Holbert [:dholbert] (largely AFK until June 28)
no flags Details
Backported fix: just the "pull" chunk from bug 67752's patch [checked in to 1.9.0: c26] (3.18 KB, patch)
2009-07-02 18:06 PDT, Daniel Holbert [:dholbert] (largely AFK until June 28)
roc: review+
mbeltzner: approval1.9.1.2+
dveditz: approval1.9.0.14+
Details | Diff | Review
crashtest (as patch) (1.09 KB, patch)
2009-07-08 15:24 PDT, Daniel Holbert [:dholbert] (largely AFK until June 28)
no flags Details | Diff | Review

Description Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-02 11:06:07 PDT
Created attachment 386557 [details]
testcase 1 (crashes on loading in 1.9.1 & 1.9.0)

I'm filing this bug for a remaining 1.9.1 crash, with a modified version of bug 467493's testcase. (just adding an empty floated div)  See in particular bug 467493 comment 25.

The attached "testcase 1" crashes in both optimized and debug 1.9.1 builds.  (This testcase is the same as "testcase 3" on bug 467493)

In my debug build, I get these assertions before the crash:
###!!! ASSERTION: Placeholder relationship should have been torn down; see comments in nsPlaceholderFrame.h: '!shell->FrameManager()->GetPlaceholderFrameFor(mOutOfFlowFrame)', file /mozilla/layout/generic/nsPlaceholderFrame.cpp, line 132
###!!! ASSERTION: no placeholder frame: 'nsnull != placeholderFrame', file /mozilla/layout/generic/nsHTMLReflowState.cpp, line 1122

I also get many copies of this warning:
WARNING: bad value: file /mozilla/layout/generic/nsFloatManager.cpp, line 149

Crash report (hasn't loaded yet, but should soon):
http://crash-stats.mozilla.com/report/pending/f7955dc3-a1ff-4e8a-a542-bdc8f2090702
Comment 1 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-02 11:08:26 PDT
testcase 1 crashes on load in this 1.9.1 build:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5

and it's WFM in this mozilla-central build:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090701
Minefield/3.6a1pre

Presumably, this used to crash on mozilla-central, even after the float-manager landing fixed bug 467493's other testcases.  A fix range would be great.
Comment 2 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-02 11:10:50 PDT
Created attachment 386559 [details]
backtrace at crash

Here's a backtrace of the crash on testcase 1.
Comment 3 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-02 11:12:02 PDT
Created attachment 386560 [details]
frametree at crash

Here's a dump of the frametree at crash-time. (This is from the same crash occurrence as was used to generate the backtrace in attachment 386559 [details])
Comment 4 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-02 11:24:48 PDT
(In reply to comment #1)
> Presumably, this used to crash on mozilla-central, even after the float-manager
> landing fixed bug 467493's other testcases.  A fix range would be great.

Ok, here's the m-c fix range:
CRASHES ON:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090505 Minefield/3.6a1pre
WFM on:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090506 Minefield/3.6a1pre
Comment 5 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-02 11:27:38 PDT
Pushlog for fix range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2a82ba52ed30&tochange=7518078cb842

bug 67752 looks like the only likely fix-candidate in that range.
Comment 6 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-02 11:52:16 PDT
I've confirmed that testcase 1 crashes on Mac 1.9.1, too.

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1.1pre) Gecko/20090630 Shiretoko/3.5.1pre
Crash report:
http://crash-stats.mozilla.com/report/index/1c2f4392-268e-4d99-a443-0a0922090702
Comment 7 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-02 14:29:12 PDT
(In reply to comment #5)
> bug 67752 looks like the only likely fix-candidate in that range.

Confirmed that bug 67752 fixed this on mozilla-central.  Changeset 47e935bd7f3b hits the assertions & the crash from comment 0, whereas changeset a08d1947ec5a does not (though it still has lots of "WARNING: bad value" messages).

It doesn't look like bug 67752 is targeted for 1.9.1 (or 1.9.0), though, so it probably won't be able to fix this on the branches...
Comment 8 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-07-02 14:57:09 PDT
It's really weird that bug 67752 changed behavior here.  Are we actually interrupting on the testcase?  Or was it the scrolling changes or some such?

Or maybe it was the changes in nsBlockFrame::ReflowDirtyLines around pulling stuff?  We could port that sort of thing to branch, I think.
Comment 9 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-02 17:15:03 PDT
(In reply to comment #8)
> Or maybe it was the changes in nsBlockFrame::ReflowDirtyLines around pulling
> stuff?

That sounds most likely.
Comment 10 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-07-02 17:21:00 PDT
I think that we should try to backport those.  Do you want to give it a shot or should I?
Comment 11 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-02 17:29:20 PDT
In my build at changeset a08d1947ec5a (that's bug 67752's patch), if I revert bug 67752's changes to nsBlockFrame.* and recompile, the assertions & the crash return.  So it's definitely one of the changes to nsBlockFrame.cpp that fixed this (probably the pulling stuff, as bz suggested).

I'll try to backport that -- looks pretty straightforward.
Comment 12 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-02 18:06:50 PDT
Created attachment 386646 [details] [diff] [review]
Backported fix: just the "pull" chunk from bug 67752's patch [checked in to 1.9.0: c26]

This is just the pulling chunk from bug 67752's nsBlockFrame changes.  It applies cleanly on 1.9.1 and 1.9.0, and it fixes this crash on both branches.  It also fixes the crash on bug 467493's testcases on 1.9.0 branch.
Comment 13 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-08 15:09:38 PDT
Comment on attachment 386646 [details] [diff] [review]
Backported fix: just the "pull" chunk from bug 67752's patch [checked in to 1.9.0: c26]

So, here's a summary what's going on in this backported patch, starting with the final chunk:

>@@ -2076,24 +2079,28 @@ nsBlockFrame::ReflowDirtyLines(nsBlockRe
>       if (!bifLineIter.Next() ||                
>           !bifLineIter.GetLine()->IsDirty()) {
>-        if (IS_TRUE_OVERFLOW_CONTAINER(aState.mNextInFlow))
>-          NS_FRAME_SET_OVERFLOW_INCOMPLETE(aState.mReflowStatus);
>-        else
>-          NS_FRAME_SET_INCOMPLETE(aState.mReflowStatus);
>         skipPull=PR_TRUE;
>       }
>     }
>   }
>+
>+  if (skipPull && aState.mNextInFlow) {
>+    NS_ASSERTION(heightConstrained, "Height should be constrained here\n");
>+    if (IS_TRUE_OVERFLOW_CONTAINER(aState.mNextInFlow))
>+      NS_FRAME_SET_OVERFLOW_INCOMPLETE(aState.mReflowStatus);
>+    else
>+      NS_FRAME_SET_INCOMPLETE(aState.mReflowStatus);
>+  }

This chunk makes sure that, if we've got a next-in-flow and we're going to skip pulling from it, we'll set an INCOMPLETE bit on our reflow status.  Otherwise, our next-in-flow could end up being destroyed while we've still potentially got kids with mNextContinuation pointers into it, and that's bad news.

Previously, we were only setting the INCOMPLETE bit inside of a clause that required a whole bunch of conditions to be satisfied. (In the testcase attached here, the |aState.mReflowState.mFlags.mNextInFlowUntouched| condition was the one that failed and made us skip  setting the INCOMPLETE bit.)

>-  // We can skip trying to pull up the next line if there is no next
>-  // in flow or we were told not to or we know it will be futile, i.e.,
>+  // We can skip trying to pull up the next line if our height is constrained
>+  // (so we can report being incomplete) and there is no next in flow or we
>+  // were told not to or we know it will be futile, i.e.,
[snip]
>-  PRBool skipPull = willReflowAgain;
>-  if (aState.mNextInFlow &&
>+  PRBool heightConstrained =
>+    aState.mReflowState.availableHeight != NS_UNCONSTRAINEDSIZE;
>+  PRBool skipPull = willReflowAgain && heightConstrained;

This chunk makes sure we only set |skipPull| if our height is constrained.  Otherwise (if we have an unconstrained height), we may not be safe to return an INCOMPLETE reflow status (which would end up getting set in the above-described chunk of the patch, if |skipPull| were true).


>-  if (aState.mNextInFlow &&
>+  if (!skipPull && heightConstrained && aState.mNextInFlow &&
>       (aState.mReflowState.mFlags.mNextInFlowUntouched &&

This chunk adds two conditions to the "if" check here -- !skipPull && heightConstrained.

Now that the INCOMPLETE-setting has been moved out of this "if" clause, this clause just does one thing -- it turns on |skipPull| in certain situations.  It doesn't do anything else.  So, there's no point in entering it if |skipPull| is already turned on. (hence the first added condition)  We also don't want to enter it if our height is unconstrained, because then we definitely don't want to turn on skipPull (because we can't return an incomplete reflow status, as described above).  (hence the second added condition)
Comment 14 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-08 15:24:22 PDT
Created attachment 387549 [details] [diff] [review]
crashtest (as patch)

Here's testcase 1 as a crashtest, for landing on all branches after this is fixed on all branches.  I've just re-confirmed that this crashes pre-patch, and doesn't crash post-patch, using an up-to-date 1.9.1 checkout.

(I generated this crashtest patch using a 1.9.1 build -- it may need minor bitrot-fixing in "crashtest.list" on 1.9.0 and on mozilla-central)
Comment 15 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-08 22:51:33 PDT
Comment on attachment 386646 [details] [diff] [review]
Backported fix: just the "pull" chunk from bug 67752's patch [checked in to 1.9.0: c26]

Requesting approval to land this backport on the 1.9.1 & 1.9.0 stable branches.

Fixes a security-sensitive crash, has baked on mozilla-central, and is a fairly minimal change. (See comment 13 for an overview of what it does.)

This also will fix bug 467493 (a related crash) on the 1.9.0 branch, as mentioned in comment 12. (That bug is WFM on 1.9.1, because the float manager works around it.)
Comment 16 Daniel Veditz [:dveditz] 2009-07-10 11:32:04 PDT
Comment on attachment 386646 [details] [diff] [review]
Backported fix: just the "pull" chunk from bug 67752's patch [checked in to 1.9.0: c26]

Approved for 1.9.0.13, a=dveditz for release-drivers
Comment 17 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-10 12:54:34 PDT
Attachment 386646 [details] [diff] checked in to 1.9.0 branch:
Checking in layout/generic/nsBlockFrame.cpp;
/cvsroot/mozilla/layout/generic/nsBlockFrame.cpp,v  <--  nsBlockFrame.cpp
new revision: 3.963; previous revision: 3.962
done
Comment 18 Samuel Sidler (old account; do not CC) 2009-07-10 13:22:25 PDT
Comment on attachment 386646 [details] [diff] [review]
Backported fix: just the "pull" chunk from bug 67752's patch [checked in to 1.9.0: c26]

1.9.0.13 will be in sync with 1.9.1.2, so we'll take this patch then.
Comment 19 Samuel Sidler (old account; do not CC) 2009-07-10 13:23:37 PDT
I'm leaving this as wanted1.9.1.x, but since we don't yet have our new blocking1.9.1 flag, I can't make it block 1.9.1.2 yet. It does though.
Comment 20 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-10 16:01:59 PDT
Backed out of 1.9.0 -- this appears to have somehow caused a test failure in test_bug343416.xul, on Mac.  (It went orange for two consecutive cycles, with a test-failure that doesn't appear to be known-sporadic.)
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.0/1247255968.1247259491.3895.gz
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.0/1247260655.1247264163.22911.gz

Backout:
Checking in layout/generic/nsBlockFrame.cpp;
/cvsroot/mozilla/layout/generic/nsBlockFrame.cpp,v  <--  nsBlockFrame.cpp
new revision: 3.964; previous revision: 3.963
done

Log from failing test:
{
*** 669 INFO Running chrome://mochikit/content/chrome/widget/test/test_bug343416.xul...
*** 670 INFO PASS | nsIIdleService should exist and be implemented on all tier 1 platforms.
*** 671 ERROR FAIL | Getting the idle Time should not fail in normal circumstances on any tier 1 platform. |  | chrome://mochikit/content/chrome/widget/test/test_bug343416.xul
*** 672 INFO PASS | The nsIIdleService should allow us to add an observer.
*** 673 INFO PASS | The nsIIdleService should allow us to add the same observer again.
*** 674 INFO PASS | The nsIIdleService should allow us to remove the observer just once.
*** 675 ERROR FAIL | Getting the idle time should not fail in normalcircumstances on any tier 1 platform. |  | chrome://mochikit/content/chrome/widget/test/test_bug343416.xul
*** 676 ERROR FAIL | The idle time should have increased by roughly the amount of time it took for the timeout to fire. |  | chrome://mochikit/content/chrome/widget/test/test_bug343416.xul
*** 677 INFO PASS | We set a timeout and it should have fired by now.
*** 678 ERROR FAIL | We added a listener and it should have been called by now. |  | chrome://mochikit/content/chrome/widget/test/test_bug343416.xul
*** 679 INFO PASS | We added a listener and we should be able to remove it.
}
Comment 21 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-10 16:08:19 PDT
Link to source of failing test:
http://mxr.mozilla.org/mozilla-central/source/widget/tests/test_bug343416.xul
Comment 22 Boris Zbarsky [:bz] (Out June 25-July 6) 2009-07-10 16:29:06 PDT
I seriously doubt this patch should have affected that test...
Comment 23 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-10 19:16:45 PDT
Me too, but with two consecutive orange cycles, best diagnostic is to back out.

Having said that, the failing test *stayed orange* in the post-backout cycle:
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.0/1247266920.1247270435.27327.gz

So this checkin couldn't be responsible.  I just pinged #build to see if someone can reboot (or otherwise fix) the still-orange tinderbox.
Comment 24 Nick Thomas [:nthomas] 2009-07-11 20:31:12 PDT
Rebooted bm-xserve21 to see if that resolves the Fx3.0 unit issue.
Comment 25 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-12 00:44:34 PDT
The reboot seems to have fixed the issue -- the cycle before the reboot was orange with the nsIIdleService failure, and the four cycles after the reboot have all been green.
Comment 26 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-12 22:26:53 PDT
Re-landed patch on 1.9.0.x (CVS Trunk):
Checking in layout/generic/nsBlockFrame.cpp;
/cvsroot/mozilla/layout/generic/nsBlockFrame.cpp,v  <--  nsBlockFrame.cpp
new revision: 3.965; previous revision: 3.964
done
Comment 27 Mike Beltzner [:beltzner, not reading bugmail] 2009-07-21 17:54:03 PDT
Comment on attachment 386646 [details] [diff] [review]
Backported fix: just the "pull" chunk from bug 67752's patch [checked in to 1.9.0: c26]

a=beltzner, please land on mozilla-1.9.1
Comment 28 Samuel Sidler (old account; do not CC) 2009-07-27 10:46:21 PDT
dholbert: Can you please land this on 1.9.1 asap?
Comment 29 Daniel Holbert [:dholbert] (largely AFK until June 28) 2009-07-27 11:03:15 PDT
Was just about to.  Landed.
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/63785ea38770

status1.9.1 --> .2fixed
Comment 30 [On PTO until 6/29] 2009-07-27 13:50:02 PDT
Verified fixed in 1.9.0 using my debug build (Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.0.13pre) Gecko/2009072711 GranParadiso/3.0.13pre). Verified crash and assertion in comment 0 from build in early July. Both are fixed now.
Comment 31 juan becerra [:juanb] 2009-07-30 19:43:03 PDT
Verified using test case in comment #0. 3.5 crashed on load. 3.5.2 did not. Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2

(3.5 and 3.5.2 behaved the same on XP, no crash).

Note You need to log in before you can comment on or make changes to this bug.