Closed Bug 391178 Opened 17 years ago Closed 17 years ago

Crash [@ nsCSSFrameConstructor::FindFrameWithContent] with XUL trees, position:fixed

Categories

(Core :: XUL, defect, P2)

x86
All
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: jruderman, Assigned: dholbert)

References

Details

(Keywords: crash, testcase, verified1.8.1.15, Whiteboard: [sg:critical?][dbaron-1.9:Rs])

Crash Data

Attachments

(5 files, 7 obsolete files)

Thread 0 Crashed reading memory at 0xdddddddd:

0   nsCSSFrameConstructor::FindFrameWithContent(nsFrameManager*, nsIFrame*, nsIContent*, nsIContent*, nsFindFrameHint*) + 372 (nsCSSFrameConstructor.cpp:10773)
1   nsCSSFrameConstructor::FindPrimaryFrameFor(nsFrameManager*, nsIContent*, nsIFrame**, nsFindFrameHint*) + 237 (nsCSSFrameConstructor.cpp:10865)
2   nsFrameManager::GetPrimaryFrameFor(nsIContent*, int) + 538 (nsFrameManager.cpp:394)
3   PresShell::GetPrimaryFrameFor(nsIContent*) const + 42 (nsPresShell.cpp:4688)
4   nsBoxObject::GetFrame(int) + 102 (nsBoxObject.cpp:141)
5   nsTreeBoxObject::GetTreeBody() + 53 (nsTreeBoxObject.cpp:116)
6   nsTreeBoxObject::GetColumns(nsITreeColumns**) + 17 (nsTreeBoxObject.cpp:234)
7   nsTreeColFrame::InvalidateColumns() + 96 (nsTreeColFrame.cpp:228)
8   nsTreeColFrame::Destroy() + 17 (nsTreeColFrame.cpp:103)
9   nsFrameList::DestroyFrames() + 48 (nsFrameList.cpp:68)
10  nsContainerFrame::Destroy() + 73 (nsContainerFrame.cpp:255)
...
Flags: blocking1.9?
Attached file testcase
Whiteboard: [sg:critical?]
jan is probably not a good assignee for this bug.  I wonder who else could take a crack?
Assignee: Jan.Varga → damon
Assignee: damon → dsicore
Stealing bug from damon.
Assignee: dsicore → dholbert
Here's what's going wrong.  (Sorry that this post is a bit long.)

When we remove "display: fixed" from the div, that makes us destroy the div's  frame subtree.  During this process, we hit 
  nsTreeColFrame::Destroy
   nsTreeColFrame::InvalidateColumns
    nsITreeBoxObject::GetColumns
     nsTreeBoxObject::GetTreeBody
      nsITreeBoxObject::GetFrame
       PresShell::GetPrimaryFrameFor
        nsFrameManager::GetPrimaryFrameFor
         nsCSSFrameConstructor::FindPrimaryFrameFor

This lowest function call is trying to find the primary frame for "aContent", which is the nsTreeColFrame's TreeBoxObject.

(I'm not sure exactly what a TreeBoxObject is, but all that matters here is that it's a direct child of the div that's being destroyed.

Within that lowest function call (FindPrimaryFrameFor), we hit these lines:
 10878   nsCOMPtr<nsIContent> parentContent = aContent->GetParent();
 10879   if (parentContent) {
 10880     parentFrame = aFrameManager->GetPrimaryFrameFor(parentContent, -1);

Here, parentContent is the div that's being removed, and line 10880 tries to look up its "primary frame".

So, here's the kicker -- the div's true primary frame is the nsAreaFrame that we're halfway through destroying.  This frame has already been removed from the frame tree, so I think we don't want to be able to find it here -- we'd _like_ the variable "parentFrame" to end up being null. (** see note at end of this post)

HOWEVER, we actually _do_ find the div's area frame, even though it's been stitched out of the tree.  We find it via its placeholder, which _isn't_ yet stitched out. (here in the code: 
http://mxr.mozilla.org/seamonkey/source/layout/base/nsCSSFrameConstructor.cpp#10769
)

As a result, we keep going through this call to FindPrimaryFrameFor, using a half-destroyed "parentFrame", and we pass that parent frame to "FindFrameWithContent" at line 10883.  This makes us walk parentFrame's subtree, and we segfault on a destroyed frame's invalid pointer. (with mContent ==  0xdddddddd)

*** Note from above: In a reversed test case where the div *starts* with no style and is later set to "position: fixed", the parentFrame variable _does_ end up being null at this point in the code, and there's no crash.  That's why I'm hypothesizing that we might like parentFrame to be null in the original test case, too...
Status: NEW → ASSIGNED
One solution is probably to always add tree body frames to the primary frame map.
That sounds like a good idea.
(In reply to comment #5)
> One solution is probably to always add tree body frames to the primary frame
> map.

Thanks for the suggestion!  I looked into this, and as it turns out, the tree body frame actually is already put in the primary frame map, but it gets removed from the map before the crashy part of the code.

Specifically, in nsCSSFrameConstructor::ContentRemoved,

 - line 9565: we call ::DeletingFrameSubtree, which walks the to-be-destroyed subtree and removes all its mappings from the primary frame map

 - line 9583: we call frameManager->RemoveFrame, which encompasses all of the Destroy() calls and has the crash.

Any other ideas?
Attached patch possible fix (obsolete) — Splinter Review
Here's one simple fix: just clear placeholderFrame.mOutOfFlowFrame before destroying the fixed area frame's subtree.

This fixes the test case.  Haven't tested it on much else yet (but will do so soon).

This solution might be dangerous -- roc said in IRC that this area of code is known to be fragile and has had issues before.
Attachment #276058 - Attachment is patch: true
Attachment #276058 - Attachment mime type: text/x-patch → text/plain
(oops -- ignore the first chunk of that patch -- that part has no effect, just adds an unused variable.  The second chunk is what matters.)
Attached patch possible fix (obsolete) — Splinter Review
Attachment #276058 - Attachment is obsolete: true
I think we should avoid calling GetPlaceholderFrameFor at all during frame destruction.

In this case, we could have nsTreeColFrame hold a strong reference to the nsITreeColumns, and call InvalidateColumns on it directly.
(In reply to comment #11)
>In this case, we could have nsTreeColFrame hold a strong reference to the
>nsITreeColumns, and call InvalidateColumns on it directly.
That might be tricky under the current ownership model because it's still the nsTreeBodyFrame that creates the nsITreeColumns so that you might find you're moving the frame hunt to a different critical path, but it should be possible to make the nsTreeBoxObject own the columns instead. At least on trunk I think all the critical code already checks for mView which is also owned by the box.
Hmm, that's not so easy, because the nsTreeBodyFrame wants an nsTreeColumns...
(In reply to comment #8)
> Created an attachment (id=276058) [details]
> possible fix
> 
> Here's one simple fix: just clear placeholderFrame.mOutOfFlowFrame before
> destroying the fixed area frame's subtree.
> 
> This fixes the test case.  Haven't tested it on much else yet (but will do so
> soon).
> 
> This solution might be dangerous -- roc said in IRC that this area of code is
> known to be fragile and has had issues before.
> 

That last patch passed the layout reftests... Do you guys think it's too risky?  It's certainly a simple solution, and it makes conceptual sense to me, based on the rest of the way we do frame destruction. 
(That is to say, our general strategy seems to be to break all ties from the frame tree to the pruned branch before we do any frame destruction.  And currently, placeholder frames violate this strategy.)

I guess I'm OK with it if Boris and David are. Maybe you could find the bugs that touched the placeholder cleanup code and recheck their testcases?
Does blockframe not rely on the float placeholders when tearing down the float?  See nsBlockFrame::RemoveFloat.  I thought that was the reason the ordering here was as it is?
D'oh, looks like bz's right -- we get this chain of calls:

  nsBlockFrame::RemoveFloat(nsIFrame* aFloat)
   nsLineBox::RemoveFloat(nsIFrame* aFrame)
    nsFloatCacheList::Find(nsIFrame* aOutOfFlowFrame)
   
which iterates through the line's placeholders and says

     (fc->mPlaceholder->GetOutOfFlowFrame() == aOutOfFlowFrame)

And actually, it looks like the XXX comment near nsCSSFrameConstructor.cpp:9581 basically says that :)  My bad -- I'd misread it before, and had thought we were trying to preserve the pointer in the *other* direction.
Flags: blocking1.9? → blocking1.9+
OS: Mac OS X → All
Whiteboard: [sg:critical?] → [sg:critical?][dbaron-1.9:Rs]
Interesting... this seems to have become WORKSFORME.  Does anyone else still see the crash?

Gonna see if I can figure out what fixed it.
In a current trunk build, I'm getting this assertion failure instead of the crash:

###!!! ASSERTION: Null out-of-flow for placeholder?: 'outOfFlow', file /scratch/work/builds/trunk.07-10-24.10-57/mozilla/layout/base/../generic/nsPlaceholderFrame.h, line 175
(In reply to comment #19)
> In a current trunk build, I'm getting this assertion failure instead of the
> crash:
> 
> ###!!! ASSERTION: Null out-of-flow for placeholder?

... which is already filed as bug 399715, which this bug blocks.
Attached file testcase2
I'm still crashing with this testcase after 100ms using current trunk build.
Attached patch fix (obsolete) — Splinter Review
This patch fixes the crash in testcase2.  The problem is that nsTreeBoxObject::GetTreeBody is walking the frame tree during frame destruction, which is bad.

The idea of this patch is to clear the nsTreeBoxObject's mContent pointer during destruction, to thwart its ability to traverse the frame tree in GetTreeBody.

Here's what happens, after this patch:
 - The added call to Clear clears the TreeBoxObject's mContent pointer. (via nsBoxObject::Clear)
 - In GetTreeBody(), if we don't already have mTreeBody cached (and we don't in this testcase), we now won't be able to look it up, because nsBoxObject::GetFrame() is going to return null.
    ** GetFrame will return null because mContent is null, and it uses mContent to look up the frame.
  - This makes GetTreeBody() just return null without doing any damage.

One slightly messy bit -- I do have to cast the nsITreeBoxObject to a nsTreeBoxObject in order to access Clear(), because it's not part of the nsITreeBoxObject interface.  There's probably a cleaner way to do this.
Attachment #276059 - Attachment is obsolete: true
Attachment #287443 - Flags: review?(roc)
+      ((nsTreeBoxObject*)treeBoxObject)->Clear(); // clears its mContent pointer

This is OK, just use a static_cast. These interfaces are bogus anyway.

My question is, why does nsTreeColFrame::Destroy need to call InvalidateColumns at all? With your patch, it's not really going to do anything. So are we going to leak or something?
Hrm, good question.  I'd originally been thinking that the treeBoxObject's mTreeBody would still be around in cases where it was non-null, but I guess Clear() clears that as well.  So you're right, InvalidateColumns wouldn't do anything when called by Destroy, after my patch.


From exploring the source, it looks like all that nsTreeColFrame::InvalidateColumns() does is:
  - Get the TreeBoxObject
  - In GetColumns:
    * Try to look up the TreeBody
    * Try to get a nsCOM reference to the TreeBody's mColumns
  - If the above succeeded, we call InvalidateColumns on the returned nsITreeColumns, which just does this:

 552 NS_IMETHODIMP
 553 nsTreeColumns::InvalidateColumns()
 554 {
 555   NS_IF_RELEASE(mFirstColumn);
 556   return NS_OK;
 557 }

Note that nsTreeColumns::~nsTreeColumns is similar:
 373 nsTreeColumns::~nsTreeColumns()
 374 {
 ...   ...
 378   NS_IF_RELEASE(mFirstColumn);
 379 }


I'm still not totally familiar with how NS_IF_RELEASE / NS_IF_ADDREF are used and how the whole tree-object structure fits together, but it looks to me like we'll effectively do the work of InvalidateColumns() when (if?) the nsTreeColumns object is destroyed.

So, the InvalidateColumns in nsTreeColFrame::Destroy() may be redundant.
My understanding is that under normal circumstances, i.e. when you're removing a column from a tree, rather than whatever Jesse managed to do to the frame tree, you need to invalidate the columns so that the next time the tree paints it won't try to paint deleted columns.
Well then, shouldn't whatever removes the column invalidate the columns, instead of relying on frame destruction to do it? A frame might not have been constructed for some reason.
(In reply to comment #26)
>A frame might not have been constructed for some reason.
Then no invalidation should be necessary. (Not that this justifies the code.)
I see.

Seems to me then that the simplest way to fix this is to have nsTreeColFrame carry a strong ref to nsITreeColumns and call InvalidateColumns directly on that.

In the bigger picture, how bad would it be for performance if we stopped caching the columnn list and just build or iterate it every time we need it?
(In reply to comment #28)
>Seems to me then that the simplest way to fix this is to have nsTreeColFrame
>carry a strong ref to nsITreeColumns and call InvalidateColumns directly on that.
I think the difficulty here might be that multiple nsTreeColumns object can be created during the lifetime of an nsITreeColFrame, since they are currently created by the nsTreeBodyFrame object which is the <treechildren>'s frame.
On the other hand I don't think that nsTreeColFrame can outlive nsTreeBoxObject so it may be possible to rewrite nsTreeBoxObject to cache the nsTreeColumns.
This fix is similar to the previous one, in that it aims to make  nsTreeColFrame::InvalidateColumns' frame-tree-walking behavior depend on whether we're destructing frames.

This new patch makes use of the "GetCachedTreeBody" function added in Bug 397954, to cleanly avoid the tree-walking behavior within nsTreeBoxObject::GetTreeBody.
Attachment #287443 - Attachment is obsolete: true
Attachment #289406 - Flags: superreview?(roc)
Attachment #289406 - Flags: review?(roc)
Attachment #287443 - Flags: review?(roc)
'better fix' fixes: 
 - testcase2 on this bug (no crash or assertions)
 - testcase on Bug 366583 (no more ASSERTION: Null out-of-flow for placeholder?)
 - testcase on Bug 399715 (no more ASSERTION: Null out-of-flow for placeholder?)
Attachment #289406 - Flags: superreview?(roc)
Attachment #289406 - Flags: superreview+
Attachment #289406 - Flags: review?(roc)
Attachment #289406 - Flags: review+
'better fix' checked in:

Checking in nsTreeColFrame.cpp;
/cvsroot/mozilla/layout/xul/base/src/tree/src/nsTreeColFrame.cpp,v  <--  nsTreeColFrame.cpp
new revision: 1.52; previous revision: 1.51
done
Checking in nsTreeColFrame.h;
/cvsroot/mozilla/layout/xul/base/src/tree/src/nsTreeColFrame.h,v  <--  nsTreeColFrame.h
new revision: 1.31; previous revision: 1.30
done
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Flags: in-testsuite?
verified fixed using Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b4pre) Gecko/2008020819 Minefield/3.0b4pre and the testcases from this bug. 

No crash on testcases -> Verified fixed
Status: RESOLVED → VERIFIED
This bug affects branch.  Branch crashes in the same way that trunk used to -- [@ nsCSSFrameConstructor::FindFrameWithContent] dereferencing 0xdddddddd.
Flags: wanted1.8.1.x+
Flags: blocking1.8.1.14?
I get no crashes on testcase / testcase2, using this build:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.13) Gecko/20080325 Ubuntu/8.04 (hardy) Firefox/2.0.0.13

Jesse, what version of branch did you see this crash on?
I tested with a debug build from last night.
Attached patch branch patch v1 (obsolete) — Splinter Review
Trunk patch seems to apply cleanly on branch, with a little fuzz.  This patch is the result of doing just that, and then diffing to get a no-fuzz-needed version.

Jesse, could you apply this to your debug build and see if this 
 a) compiles
 b) fixes the branch crash?
Comment on attachment 311950 [details] [diff] [review]
branch patch v1

Nevermind, this branc patch needs a bit more work -- it doesn't compile in its current state, because there's no nsTreeBoxObject.h header and no "GetCachedTreeBody" function on branch.
Attachment #311950 - Attachment is obsolete: true
Flags: blocking1.8.1.15? → blocking1.8.1.15+
Per blocking1.8.1.15 status, I'll get a branch patch for this ASAP.

My current behavior on Mac OS X, using a branch checkout from today, is:

 -'testcase': (attachment 275548 [details])
   * hang, and then segfaults within 30 sec.
   * No assertion failures
   * These warnings:
WARNING: Write failed (non-fatal), file nsInputStreamTee.cpp, line 84
WARNING: NS_ENSURE_TRUE(aContent) failed, file nsFrameManager.cpp, line 339

 - 'testcase2' (attachment 286091 [details])
   * No hang/crash
   * These warnings & assertions are displayed:
WARNING: Write failed (non-fatal), file nsInputStreamTee.cpp, line 84
WARNING: waaah!, file nsXULPrototypeDocument.cpp, line 862
###!!! ASSERTION: wrong kind of child frame: 'aIsBlock == f->GetStyleDisplay()->IsBlockLevel()', file nsLineBox.cpp, line 70
###!!! ASSERTION: frame was not removed from primary frame map before destruction or was readded to map after being removed: 'Not Reached', file nsFrameManager.cpp, line 734
###!!! ASSERTION: wrong kind of child frame: 'aIsBlock == f->GetStyleDisplay()->IsBlockLevel()', file nsLineBox.cpp, line 70
Branch OS X Crash is:

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xdddddddd
0x11fe728a in nsCSSFrameConstructor::FindFrameWithContent ()

Attaching backtrace.
Attached patch branch patch v2 (obsolete) — Splinter Review
Ok, this updated branch patch addresses the issues from comment 38:
 - Splits out nsTreeBoxObject class declaration into nsTreeBoxObject.h
 - Adds the (trivial) GetCachedTreeBody implementation, taken from Bug
397954's patch.

However, this sadly doesn't fix the segfault / assertions / warnings mentioned in comment 39 :(
Fixed my debug build a bit (per bug 401767 comment 14), and now I've got this more-detailed backtrace for the crash.
Attachment #319455 - Attachment is obsolete: true
Comment on attachment 319468 [details] [diff] [review]
branch patch v2

> NS_IMETHODIMP
> nsTreeColFrame::Init(nsPresContext*  aPresContext,
> ...
>-  InvalidateColumns();
>+  InvalidateColumns(PR_FALSE);
>   return rv;
> }
> 
> NS_IMETHODIMP                                                                   
> nsTreeColFrame::Destroy(nsPresContext* aPresContext)                          
> {
>   InvalidateColumns(); 

Aha -- it looks like applying the trunk patch with fuzz got me into trouble here -- the 'InvalidateColumns(PR_FALSE);' call ended up in Init(), when it actually belongs in Destroy().

I'm trying a build with that fixed right now.
Attached patch branch patch v3 (obsolete) — Splinter Review
This patch fixes the mistake mentioned in previous comment.  It does indeed fix the crash on 'testcase'.
Attachment #319468 - Attachment is obsolete: true
FWIW, after applying the patch, I still get the assertions and warnings mentioned in comment 39, except that this scary assertion....

> ###!!! ASSERTION: frame was not removed from primary frame map before
> destruction or was readded to map after being removed: 'Not Reached', file
> nsFrameManager.cpp, line 734

... is now gone, and replaced with this warning:

WARNING: cannot get parent, file /Users/dholbert/builds/branch/mozilla/widget/src/mac/nsWindow.cpp, line 2380
Attached patch branch patch v3bSplinter Review
This patch is identical to the last -- I just cleaned up some trailing whitespace in nsTreeColFrame::Destroy around the patched region.
Attachment #319485 - Attachment is obsolete: true
Attachment #319487 - Flags: superreview?(roc)
Attachment #319487 - Flags: review?(roc)
Attachment #319487 - Flags: superreview?(roc)
Attachment #319487 - Flags: superreview+
Attachment #319487 - Flags: review?(roc)
Attachment #319487 - Flags: review+
Comment on attachment 319487 [details] [diff] [review]
branch patch v3b

Thanks for the r+sr, roc!

Requesting approval for inclusion in 1.8.1.15.
Attachment #319487 - Flags: approval1.8.1.15?
Comment on attachment 319487 [details] [diff] [review]
branch patch v3b

Approved for 1.8.1.15, a=dveditz for release-drivers
Attachment #319487 - Flags: approval1.8.1.15? → approval1.8.1.15+
Branch patch checked in.

Checking in src/nsBoxObject.h;
/cvsroot/mozilla/layout/xul/base/src/nsBoxObject.h,v  <--  nsBoxObject.h
new revision: 1.15.14.2; previous revision: 1.15.14.1
done
Checking in src/tree/src/nsTreeBoxObject.cpp;
/cvsroot/mozilla/layout/xul/base/src/tree/src/nsTreeBoxObject.cpp,v  <--  nsTreeBoxObject.cpp
new revision: 1.45.32.6; previous revision: 1.45.32.5
done
Checking in src/tree/src/nsTreeBoxObject.h;
/cvsroot/mozilla/layout/xul/base/src/tree/src/nsTreeBoxObject.h,v  <--  nsTreeBoxObject.h
new revision: 1.2.16.1; previous revision: 1.2
done
Checking in src/tree/src/nsTreeColFrame.cpp;
/cvsroot/mozilla/layout/xul/base/src/tree/src/nsTreeColFrame.cpp,v  <--  nsTreeColFrame.cpp
new revision: 1.32.24.2; previous revision: 1.32.24.1
done
Checking in src/tree/src/nsTreeColFrame.h;
/cvsroot/mozilla/layout/xul/base/src/tree/src/nsTreeColFrame.h,v  <--  nsTreeColFrame.h
new revision: 1.19.24.2; previous revision: 1.19.24.1
done
I can't repro the crash with either testcase using Windows XP or OS X 10.5.3. It just sits there in all instances.
Oh, I forgot: This is specifically using 2.0.0.14 in order to verify the crash before verifying the fix for 2.0.0.15.
IIRC, for branch, this only crashed in debug builds, and crashed on Mac but not on Linux.  (so, not sure if it affected Windows)
Ah, good to know.

Do you have a debug branch build that you could verify this with? I think Tomcat builds debug for QA but just Windows.
(In reply to comment #53)
> Do you have a debug branch build that you could verify this with? I think
> Tomcat builds debug for QA but just Windows.
> 
Also Mac and Linux, but only trunk builds. But maybe martijn or bob should be also to verify this bug, i think they build also 1.8 branch builds.
Martijn or Bob, can either of you verify this in a debug 1.8 mac build?
verified fixed 1.8.1.15 mac os x 10.5.3 no crash on either testcase.
(In reply to comment #56)
> verified fixed 1.8.1.15 mac os x 10.5.3 no crash on either testcase.

Bob, Just to confirm -- did you test this in a debug build?  (see comment 52 thru 55)
yes, debug.
Group: security
Component: XP Toolkit/Widgets: Trees → XUL
QA Contact: xptoolkit.trees → xptoolkit.widgets
I checked in "testcase" and "testcase2" (adding "reftest-wait", since they use JS & minor setTimeouts):  http://hg.mozilla.org/mozilla-central/rev/25a5167ee32d
Flags: in-testsuite? → in-testsuite+
Crash Signature: [@ nsCSSFrameConstructor::FindFrameWithContent]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: