Closed Bug 393936 Opened 15 years ago Closed 13 years ago

[PATCH] Crash [@ nsCellMapColumnIterator::GetNextFrame] with display: table-footer-group, binding, contenteditable and image

Categories

(Core :: ImageLib, defect, P3)

defect

Tracking

()

VERIFIED FIXED
mozilla1.9.2a1

People

(Reporter: martijn.martijn, Assigned: joe)

References

(Blocks 1 open bug)

Details

(5 keywords)

Crash Data

Attachments

(5 files, 5 obsolete files)

See upcoming testcase, which crash current trunk builds on load or reload.

http://crash-stats.mozilla.com/report/index/a03be961-54fc-11dc-92fe-001a4bd43ed6
0  	nsCellMapColumnIterator::GetNextFrame(int*, int*)  	 mozilla/layout/tables/nsCellMap.cpp:2801
1 	BasicTableLayoutStrategy::ComputeColumnIntrinsicWidths(nsIRenderingContext*) 	mozilla/layout/tables/BasicTableLayoutStrategy.cpp:295
2 	BasicTableLayoutStrategy::ComputeIntrinsicWidths(nsIRenderingContext*) 	mozilla/layout/tables/BasicTableLayoutStrategy.cpp:576
3 	BasicTableLayoutStrategy::GetMinWidth(nsIRenderingContext*) 	mozilla/layout/tables/BasicTableLayoutStrategy.cpp:69
4 	nsTableFrame::TableShrinkWidthToFit(nsIRenderingContext*, int) 	mozilla/layout/tables/nsTableFrame.cpp:1658
5 	nsTableFrame::ComputeAutoSize(nsIRenderingContext*, nsSize, int, nsSize, nsSize, nsSize, int) 	mozilla/layout/tables/nsTableFrame.cpp:1690
6 	nsFrame::ComputeSize(nsIRenderingContext*, nsSize, int, nsSize, nsSize, nsSize, int) 	mozilla/layout/generic/nsFrame.cpp:2966
7 	nsTableFrame::ComputeSize(nsIRenderingContext*, nsSize, int, nsSize, nsSize, nsSize, int) 	mozilla/layout/forms/nsFieldSetFrame.cpp:429
8 	nsHTMLReflowState::InitConstraints(nsPresContext*, int, int, nsMargin const*, nsMargin const*) 	mozilla/layout/generic/nsHTMLReflowState.cpp:1820
9 	nsHTMLReflowState::Init(nsPresContext*, int, int, nsMargin const*, nsMargin const*) 	mozilla/layout/generic/nsHTMLReflowState.cpp:315
10 	nsTableOuterFrame::InitChildReflowState(nsPresContext&, nsHTMLReflowState&) 	mozilla/layout/tables/nsTableOuterFrame.cpp:468
11 	nsTableOuterFrame::OuterReflowChild(nsPresContext*, nsIFrame*, nsHTMLReflowState const&, void*, nsHTMLReflowMetrics&, int, nsSize&, nsMargin&, unsigned int&) 	mozilla/layout/tables/nsTableOuterFrame.cpp:1133
12 	nsTableOuterFrame::Reflow(nsPresContext*, nsHTMLReflowMetrics&, nsHTMLReflowState const&, unsigned int&) 	mozilla/layout/tables/nsTableOuterFrame.cpp:1255
13 	nsBlockReflowContext::ReflowBlock(nsRect const&, int, nsCollapsingMargin&, int, int, nsMargin&, nsLineBox*, nsHTMLReflowState&, unsigned int&) 	mozilla/layout/generic/nsBlockReflowContext.cpp:338
14 	nsBlockFrame::ReflowBlockFrame(nsBlockReflowState&, nsLineList_iterator, int*) 	mozilla/layout/generic/nsBlockFrame.cpp:2922
etc.

This regressed between 2007-06-26 and 2007-06-28. And because the contenteditable attribute is necessary for the crash, I think this is somehow a regression from bug 237964.
Attached file testcase
Flags: blocking1.9?
Attached file stack
###!!! ASSERTION: reflowing in the middle of frame construction
OS: Windows XP → All
Can we make one of the calls on that stack (in the #39 -- #51 range)
asynchronous off an event instead?
Uh...  Cancel() must not notify anything sync.  In particular, the removal from the loadgroup MUST be async per the nsIRequest contract (though we should document this more clearly in nsIRequest.idl).
Component: Layout: Tables → ImageLib
QA Contact: layout.tables → imagelib
Attached patch Possible patch (obsolete) — Splinter Review
Compiled, but untested (don't have a fully compiling build at the moment).
So I just tested this, and it doesn't quite work.  Unfortunately, mListener is a weak pointer and consumers rely on Cancel() nulling it out.  There are at least some cases where the Cancel() call is actually in mListener's destructor.  In these cases we really shouldn't be sending OnStopRequest to the listener either.

Perhaps there should be a status code that such callsites can use to indicate "cancel and never talk to me again"?  Feels hacky, but this whole setup with having to hold the imgIRequest which has a pointer back to us has this little leak problem if all the refs are made strong.  :(

In any case, with just that patch we crash in OnStartRequest on the testcase because mListener is pointing to deleted memory.
Flags: blocking1.9? → blocking1.9+
Priority: -- → P2
Still crashing for me, using a 2007-11-01 trunk build.
Stuart, would you be OK with either introducing a status code to pass to cancel(), as described in comment 6, or an API which we could use to tell the imgRequestProxy to drop its listener reference instead of overloading cancel()?
Assignee: nobody → pavlov
Flags: wanted1.9.0.x+
Flags: tracking1.9+
Flags: blocking1.9-
Priority: P2 → --
Blocks: 428263
This testcase now triggers

###!!! ASSERTION: Someone forgot to block scripts: 'aIsSafeToFlush == nsContentUtils::IsSafeToRunScript()', file /Users/jruderman/trunk/mozilla/layout/base/nsPresShell.cpp, line 4532

in addition to the three assertions it has triggered for a long time

###!!! ASSERTION: reflowing in the middle of frame construction: 'mPresContext->mLayoutPhaseCount[eLayoutPhase_FrameC] == 0', file ../../dist/include/layout/nsPresContext.h, line 948

###!!! ASSERTION: Why no rowgroups?: 'mOrigCells == 0', file /Users/jruderman/trunk/mozilla/layout/tables/nsCellMap.h, line 632

###!!! ASSERTION: Bogus mOrigCells?: 'mCurMapRow < mCurMapRelevantRowCount', file /Users/jruderman/trunk/mozilla/layout/tables/nsCellMap.cpp, line 2928

Still crashing for me, using:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b1pre) Gecko/20080905031348 Minefield/3.1b1pre
Flags: blocking1.9.1?
Assignee: pavlov → joe
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P1
Since we absolutely must make removal from the loadgroup async, this is an updated version of Boris' patch that includes a new function imgIRequest::CancelAndForgetObserver().

I think I fixed all callers of Cancel() who needed to be CancelAndForgetObserver instead, but there are a couple of calls to SetFrame(nsnull) in nsImageFrame and nsBulletFrame that I can't fix since I don't understand the class hierarchy. Ideally, that would be a simple call to CancelAndForgetObserver in nsImageFrame::Destroy() instead of removing the observer from the image loader and setting the frame to null, but I don't know how things are structured to do that.
Attachment #278728 - Attachment is obsolete: true
Attachment #341971 - Flags: review?(bzbarsky)
Also, I should mention that patch fixes the crash in this bug. I'm running through mochitests and reftests to make sure I haven't broken anything else.
Attached patch Enhanced patch with -U 8 (obsolete) — Splinter Review
Attachment #341971 - Attachment is obsolete: true
Attachment #341977 - Flags: review?(bzbarsky)
Attachment #341971 - Flags: review?(bzbarsky)
So as I mentioned on irc, we need to keep the SetFrame(nsnull) calls that are in place now.  Just replacing Cancel() with CancelAndForgetObserver() sounds reasonable to me.
Attached patch Re-add SetFrame(nsnull) calls (obsolete) — Splinter Review
Attachment #341977 - Attachment is obsolete: true
Attachment #342282 - Flags: superreview?
Attachment #342282 - Flags: review?(bzbarsky)
Attachment #341977 - Flags: review?(bzbarsky)
Attachment #342282 - Flags: superreview? → superreview?(pavlov)
FYI, this passes all mochitests and reftests.
Attachment #342282 - Flags: review?(bzbarsky) → review+
Comment on attachment 342282 [details] [diff] [review]
Re-add SetFrame(nsnull) calls

>+NS_IMETHODIMP imgRequestProxy::CancelAndForgetObserver(nsresult aStatus)
>+  NullOutListener();
>+}

return NS_OK?  Or return the return value of Cancel()?  In any case, as written this won't compile on Windows: gotta return _something_.

With that fixed, looks good.
Attached patch Address review comments (obsolete) — Splinter Review
Carrying over r+.
Attachment #342288 - Flags: superreview?(pavlov)
Attachment #342288 - Flags: review+
Attachment #342282 - Attachment is obsolete: true
Attachment #342282 - Flags: superreview?(pavlov)
Attachment #342288 - Flags: superreview?(pavlov) → superreview+
From my tests, this no longer crashes in current Firefox 3.1 nightlies.
Pushed to 3.1 in http://hg.mozilla.org/mozilla-central/rev/6bedb1e92dd0
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Crap - I forgot to reopen this. It bounced out immediately because this patch caused refcount leaks. I'm currently debugging that.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The previous patch had a nice subtle leak that occurred because CancelAndForgetObserver() called Cancel() before calling NullOutListener() (to avoid code duplication). Unfortunately, the important part of Cancel(), it calling mOwner->RemoveProxy(), was done asynchronously, so by the time it get called, mListener was null, and so OnStopRequest was never called on the listener.
Attachment #342288 - Attachment is obsolete: true
Attachment #351252 - Flags: superreview?(pavlov)
Attachment #351252 - Flags: review?(bzbarsky)
Hold on.  Which listener needed the OnStopRequest call?  See comment 6.
Hm, that's not entirely clear. (It might be nsGIFDecoder2, in at least one case, but it's not clear to me where its mObserver gets nulled out.) 

This patch does the "simple" thing, which is simply make it possible to do both an asynchronous and a synchronous cancel, with the mechanics of that cancel being the same. This isn't quite the same as what you mentioned in comment 6 (AIUI), since we do talk to the listener again, we just do it synchronously.
But the point is that a number of the callers of this method are calling it in a destructor.  You _can't_ talk to them safely.  I know the old code sort of did, but that seems bogus to me...
I agree. In fact, almost all callers of CancelAndForgetObserver() are directly _inside_ the destructor.

However, I'm not sure that I feel 100% comfortable making that change, because it involves some pretty significant changes to the callers. In the end, I think it'd end up making CancelAndForgetObserver() entirely obsolete.

If you really want that change, I can do it, but it doesn't feel like the safest thing to do. This way maintains the previous behaviour for the places that need it, and fixes Cancel() for the places that didn't know about imgRequestProxy's broken Cancel() behaviour.
Ok, fair enough.  I guess this is at least not making things any worse.

Will try to review tomorrow.
Attachment #351252 - Flags: review?(bzbarsky) → review+
Summary: Crash [@ nsCellMapColumnIterator::GetNextFrame] with display: table-footer-group, binding, contenteditable and image → [PATCH] Crash [@ nsCellMapColumnIterator::GetNextFrame] with display: table-footer-group, binding, contenteditable and image
Comment on attachment 351252 [details] [diff] [review]
Fix refcount leaks

Looks fine, but add more documentation for the new method and for the old cancel() explaining what they do and why you shouldn't use the new one in new code.
Attachment #351252 - Flags: superreview?(pavlov) → superreview+
Pushed to mozilla-central: http://hg.mozilla.org/mozilla-central/rev/5f6b82c5c67e

After some baking on m-c, I'll push to 1.9.1.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
had to back this due to leaks on tinderbox
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Pushed to mozilla-central: http://hg.mozilla.org/mozilla-central/rev/e87d4b42a4b2

After some baking, I'll push to 1.9.1.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
I can't verify the fix, since builds before the fix were already not crashing with the testcase.
Depends on: 471101
Hardware: x86 → All
Target Milestone: --- → mozilla1.9.2a1
Verified with:

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b3pre) Gecko/20090114 Shiretoko/3.1b3pre Ubiquity/0.1.4 ID:20090114020333

Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090113 Minefield/3.2a1pre ID:20090113020320

Can we please get this checked-in as a crashtest?
Status: RESOLVED → VERIFIED
Flags: in-testsuite?
Blocks: 469880
Does that include the fix for bug 471101?

Also, you presumably want Joe to review this too.
Attachment #358377 - Flags: superreview+
Comment on attachment 358377 [details] [diff] [review]
for 1.9.0 with imgIRequest_MOZILLA_1_9_0_BRANCH

Looks good, but you definitely need the fix to bug 471101 too. sr=joe so long as you do that.
Comment on attachment 358377 [details] [diff] [review]
for 1.9.0 with imgIRequest_MOZILLA_1_9_0_BRANCH

Looks ok if you don't change the CID and backport the other bug.
Attachment #358377 - Flags: review?(bzbarsky) → review+
Depends on: 493240
Depends on: 508057
Alexander,

What ended up happening with your 1.9.0 patch? Did Linux distributions accept it/ship it?
asac: Ditto Joe's question. See also regression fix bug 508057 that would be required if you did ship this patch.
Crash Signature: [@ nsCellMapColumnIterator::GetNextFrame]
You need to log in before you can comment on or make changes to this bug.