Last Comment Bug 411835 - Crash [@ GetChildListNameFor] with -moz-column, position:absolute
: Crash [@ GetChildListNameFor] with -moz-column, position:absolute
Status: RESOLVED FIXED
[sg:critical?][depends on bug 468563,...
: crash, regression, testcase
Product: Core
Classification: Components
Component: Layout (show other bugs)
: Trunk
: All All
: P2 critical (vote)
: mozilla1.9.2a1
Assigned To: Mats Palmgren (:mats)
:
:
Mentors:
Depends on: 468563 ZDI-CAN-852
Blocks: stirdom randomstyles 379349 412243 429881 463785 508473
  Show dependency treegraph
 
Reported: 2008-01-10 18:34 PST by Jesse Ruderman
Modified: 2013-02-23 17:10 PST (History)
18 users (show)
roc: wanted1.9.1+
roc: blocking1.9-
roc: wanted1.9.0.x+
mats: in‑testsuite+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
beta1-fixed
.16+
.16-fixed


Attachments
testcase (333 bytes, text/html)
2008-01-10 18:34 PST, Jesse Ruderman
no flags Details
Frame dump #1 (11.00 KB, text/html)
2008-01-11 01:34 PST, Mats Palmgren (:mats)
no flags Details
wip (5.32 KB, patch)
2008-01-11 01:39 PST, Mats Palmgren (:mats)
no flags Details | Diff | Splinter Review
NS_FRAME_EXTERNAL_REFERENCE hack (8.20 KB, patch)
2008-01-11 15:35 PST, Mats Palmgren (:mats)
no flags Details | Diff | Splinter Review
wip2, what I suggested in comment 9 (11.79 KB, patch)
2008-01-11 17:44 PST, Mats Palmgren (:mats)
no flags Details | Diff | Splinter Review
crashtest.diff (includes test for bug 463785) (2.64 KB, patch)
2008-11-16 17:42 PST, Mats Palmgren (:mats)
no flags Details | Diff | Splinter Review
Patch rev. 4 (7.30 KB, patch)
2008-11-16 18:39 PST, Mats Palmgren (:mats)
fantasai.bugs: review-
roc: review+
roc: superreview+
Details | Diff | Splinter Review

Description Jesse Ruderman 2008-01-10 18:34:45 PST
Created attachment 296464 [details]
testcase

Loading the testcase makes Firefox crash [@ GetChildListNameFor] dereferencing 0xdddddddd.
Comment 1 Boris Zbarsky [:bz] (still a bit busy) 2008-01-10 19:39:51 PST
We're under DeletingFrameSubtree, and the first frame in the destroyQueue is already dead.  So when we go to make the RemoveFrame() call, all sorts of bad stuff happens.

We need to fix this for 1.9, imo.
Comment 2 Mats Palmgren (:mats) 2008-01-11 01:34:17 PST
Created attachment 296503 [details]
Frame dump #1

A restyle event leads up to DeletingFrameSubtree() on 0x12f4428 (lime).
Traversing that, and its continuations, we find placeholders for
(red) and (blue) which are added to the destroy queue.
We're destroying frames in reverse order from that list so we
destroy (blue) first, which is means (red) is already destroyed
when we get to it on the destroy queue.

We shouldn't have added these frames at all to the destroy queue since
they are descendants of the frame we originally wanted to remove (lime).
That, in itself, isn't normally a problem that would lead to a crash
as long as we find all the placeholders, and in the right order.
(ie. if (red) were last on destroy queue it would have been destroyd
and remove from the frame tree by the time we destroy (blue))

What also worries me is the hypothetical case where (blue) is *not* a 
descendant of (lime), then we would recurse into it but since
DoDeletingFrameSubtree() does not traverse continuations we would fail
to find the placeholder for (red) and simply not process it, which
could lead to crashes. (We only traverse continuations for the original
frame, in DeletingFrameSubtree()).
I don't know if this case can occur though.

https://bugzilla.mozilla.org/show_bug.cgi?id=379349#c3 says:
#  The overflow frames are still linked up as next-in-flows, 
#  but might not be contiguous in a breadth-first traversal.
Does that break the assumption on order in the destroy queue?
Comment 3 Mats Palmgren (:mats) 2008-01-11 01:36:04 PST
Given the frame tree I think this is a regression from bug 379349
which probably broke the frame tree assumptions we made in bug 310638.
Comment 4 Mats Palmgren (:mats) 2008-01-11 01:39:36 PST
Created attachment 296505 [details] [diff] [review]
wip

This fixes the testcase but I'm not sure if it's enough...
Comment 5 Mats Palmgren (:mats) 2008-01-11 01:59:05 PST
Hmm, maybe "nsTArray<nsWeakFrame> destroyQueue" would be a more robust solution...
Comment 6 Boris Zbarsky [:bz] (still a bit busy) 2008-01-11 05:21:24 PST
> nsTArray<nsWeakFrame> destroyQueue

That's dangerous.  It'll only work if you preallocate the array length.  If it has to resize, it'll memmove nsWeakFrames, which breaks them (since they hold pointers to each other).
Comment 7 Mats Palmgren (:mats) 2008-01-11 15:30:00 PST
Ok, I see what you mean, thanks for pointing it out.
Comment 8 Mats Palmgren (:mats) 2008-01-11 15:35:14 PST
Created attachment 296609 [details] [diff] [review]
NS_FRAME_EXTERNAL_REFERENCE hack

Here's a hack that uses NS_FRAME_EXTERNAL_REFERENCE to get a
notification to the frame constructor which then can clear out
the frame in the destroy queue.  I'm not seriously suggesting
this as the fix, just something I wanted to explore, just in case...
(it does fix the crash though)
Comment 9 Mats Palmgren (:mats) 2008-01-11 15:50:32 PST
I think the current code relies on the fact that we skip all child
lists that contain out-of-flows, and we expect to find all relevant
placeholders instead.  Also, it does not process next-in-flows other
than for the top frame because we expected to find all those within
the subtrees of the top frame continuations.  After bug 379349, the
first assumption is clearly false and I think the second assumption
might be false too.

My current thoughts on a real fix is this:
 1. continue to process OverflowContainers-list but skip all
    out-of-flows on this list.
    (same for excessOverflowContainersList? are there more?)
 2. process next-continuations for all out-of-flows, ie these two calls:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/layout/base/nsCSSFrameConstructor.cpp&rev=1.1450&root=/cvsroot&mark=9343-9345,9348-9351#9278
    should be changed to call a new (mutual recursive) function that
    calls DoDeletingFrameSubtree() on 'outOfFlowFrame' continuations.

Does this sound like the right approach?

For the testcase this means we would not find the placeholder for (red)
when processing (cyan), since (brown) is an out-of-flow on a
OverflowContainers-list.  We will find the placeholder for (blue) when
processing the last-in-flow for (lime), we will not add (blue) to the
destroy queue but recurse on its next-continuations, including (brown)
and this time will find the placeholder for (red) but again it's within
the continuation subtrees of (lime) so it won't be added to the destroy
queue either, but we will still process its children.

For the hypothetical case I was thinking of in comment 2 where (blue)
escapes the subtree we would add it to the destroy queue and recurse on
its next-continuations where we find (red) when processing (brown), but
it's within the subtree of (blue) so it won't be added to the destroy
queue, but we will still process its children.
Comment 10 Mats Palmgren (:mats) 2008-01-11 17:44:49 PST
Created attachment 296632 [details] [diff] [review]
wip2, what I suggested in comment 9

... although now that I think about it we should probably skip float
continuations since we're supposed to have placeholder continuations
for those IIRC.

BTW, is my assumption that the [excess]overflowContainersList lists
can contain both in-flow and out-of-flow frames correct?
(if it's only OOF frames we could just skip these lists)
Comment 11 fantasai 2008-01-14 18:49:20 PST
Yes. The [excess]overflowContainersList can contain in-flow frames and abspos OOF frames only. It doesn't contain floats directly.

Abspos OOF frames never have placeholder continuations; they only have one placeholder for the first-in-flow. Fixed frames should never split. Floats have placeholder continuations, but we'd like to change that eventually.
Comment 12 Damon Sicore (:damons) 2008-02-12 14:52:28 PST
Mats, any progress on your work in progress?  :)
Comment 13 Jesse Ruderman 2008-09-07 14:14:24 PDT
Still crashes on trunk.
Comment 14 Brandon Sterne (:bsterne) 2008-11-06 11:29:34 PST
This is on our "Top Security Bugs" list and it appears to have stalled a bit.  It would be great if we can get some traction here.
Comment 15 Mats Palmgren (:mats) 2008-11-08 15:13:21 PST
The "wip2" patch also seems to fix bug 463785.  I'll try to make a patch
for review in the next couple of days, sorry for the delay.
Comment 16 Mats Palmgren (:mats) 2008-11-16 17:42:27 PST
Created attachment 348484 [details] [diff] [review]
crashtest.diff (includes test for bug 463785)
Comment 17 Mats Palmgren (:mats) 2008-11-16 18:39:37 PST
Created attachment 348490 [details] [diff] [review]
Patch rev. 4

Walk child frame next-in-flows that are overflow containers.
Skip [excess]overflowContainersList to avoid processing them more than once.
(I'm assuming overflow containers only lives on those lists and nowhere else)
Comment 18 fantasai 2008-11-21 23:16:03 PST
Comment on attachment 348490 [details] [diff] [review]
Patch rev. 4

+  // The invariant that "continuing frames should be found as part of the
+  // walk over the top-most frame's continuing frames" does not hold for
+  // out-of-flow overflow containers, so we need to walk them too.
...
+    // The above is true for normal child next-in-flows but not overflow
+    // containers which we do walk because they *can* escape the subtree
+    // we're deleting.

I believe these comments are wrong. OOF overflow containers /will/ be found if their first-in-flow's ancestor's continuations are all found. I think the problem here is the loop in DoDeletingFrameSubtree. It loops over the GetNextSibling() links and it assumes for each frame it processes that its continuations will be processed next. That assumption that breaks down when overflow containers are involved because continuations are no longer contiguous. I conclude that the comments need some revising.

I think the code in the patch is correct. I'm not convinced this is the best way to fix the problem as described above: I suspect some kind of loop over continuations in general would also solve the problem and would be easier to adapt for OOF handling in the future when float continuations don't have placeholders, etc. However we're not in that world yet, and float-placeholder relationships are tricky, so this is just an idea for consideration; I won't r- you on that.
Comment 19 Brandon Sterne (:bsterne) 2008-12-04 10:33:16 PST
Hi Mats, can you please address fantasai's review comments?  We would like to get this one closed out as it is one of our "Top Security Bugs".
Comment 20 Mats Palmgren (:mats) 2008-12-07 16:49:15 PST
Comment on attachment 348490 [details] [diff] [review]
Patch rev. 4

(In reply to comment #18)
> OOF overflow containers /will/ be found if
> their first-in-flow's ancestor's continuations are all found.

Yes, that's reasonable, but the crash occurs when the first-in-flow's
ancestor is not part of the walk at all. I think the first frame dump in
bug 463785 (attachment 347096 [details]) illustrates this.
The first-in-flow's ancestor (DIV 0x180c710) of the first-in-flow OOF
overflow container (TD 0x1901698/blue) isn't involved at all.

The comment above refers to the initial frame to DeletingFrameSubtree
(TR 0x1901000/green) and its continuations.

> I think the code in the patch is correct.

Since we agree on the code change I want to push the patch as is
(to start the baking for 1.9.1) and make comment improvements,
if necessary, as a follow-up patch, ok?
Comment 21 Robert O'Callahan (:roc) (email my personal email if necessary) 2008-12-07 17:27:52 PST
Why not just fix the comments now? Surely it won't take long.
Comment 22 Mats Palmgren (:mats) 2008-12-07 17:36:29 PST
I believe the comments are factually correct, so I don't know how to fix them.
Comment 23 Robert O'Callahan (:roc) (email my personal email if necessary) 2008-12-07 17:44:21 PST
Comment on attachment 348490 [details] [diff] [review]
Patch rev. 4

Yes, I think you're right.
Comment 24 Mats Palmgren (:mats) 2008-12-07 22:20:19 PST
http://hg.mozilla.org/mozilla-central/rev/f6aa0df95deb

I'm holding the crashtest until Firefox 3.0.x is fixed.

-> FIXED
Comment 25 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2008-12-26 07:19:01 PST
Mats, is it ok if I land this on 1.9.1 for you?
Comment 26 Mats Palmgren (:mats) 2008-12-26 07:52:29 PST
I think we should fix/bake bug 468563 first and land that at the same time.
I'll do that in a day or two.
Comment 27 Robert O'Callahan (:roc) (email my personal email if necessary) 2009-01-27 11:19:11 PST
It's been much longer than a day or two and this bug needs to land *now* to make beta3 :-(.
Comment 28 Mike Beltzner [:beltzner, not reading bugmail] 2009-01-30 15:33:46 PST
If this is truly a P1 blocker, and is truly dependent on bug 468563, then that other bug should also be a P1 blocker. If not, then we should either remove the dependency and land this one, or make it not a P1.
Comment 29 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2009-01-30 16:20:43 PST
Changing this back to P3, which is where it was before dveditz changed it last week.
Comment 30 Daniel Veditz [:dveditz] 2009-03-16 15:16:38 PDT
I'm not clear why this is blocking1.9.1-minus, but we appear to have a patch so why don't you just request 1.9.1 approval and land it?
Comment 31 Robert O'Callahan (:roc) (email my personal email if necessary) 2009-03-16 20:16:48 PDT
Comment on attachment 348490 [details] [diff] [review]
Patch rev. 4

Yeah, I can't see any reason not to take this on 1.9.1
Comment 32 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2009-03-16 21:05:31 PDT
What about comment 26, which is a regression from this bug?
Comment 33 Robert O'Callahan (:roc) (email my personal email if necessary) 2009-03-16 21:08:50 PDT
Comment on attachment 348490 [details] [diff] [review]
Patch rev. 4

You're right, I misunderstood.
Comment 34 Daniel Veditz [:dveditz] 2010-08-18 14:33:09 PDT
This was fixed on the trunk well before the 1.9.2 branch, marking it fixed for that branch.
Comment 35 Reed Loden [:reed] (use needinfo?) 2010-11-21 14:46:18 PST
1.9.1 fix pushed in bug 468563.
Comment 37 Gregory Szorc [:gps] 2013-02-23 17:10:02 PST
https://hg.mozilla.org/mozilla-central/rev/e86e7050f14b

Note You need to log in before you can comment on or make changes to this bug.