Closed Bug 53969 Opened 24 years ago Closed 24 years ago

crashes after printing and moving the mouse into a text frame

Categories

(Core :: Layout, defect, P1)

defect

Tracking

()

VERIFIED FIXED

People

(Reporter: dbaron, Assigned: waterson)

References

Details

(Keywords: crash, topcrash, Whiteboard: [nsbeta3-][rtm++] FIXED ON TRUNK and BRANCH)

Attachments

(7 files)

One of the top crashes on talkback reports (around #10 or so) is a crash in
nsTableFrame::GetRowGroupFrame with the top of the stack like this (from
2000091806):

nsTableFrame::GetRowGroupFrame  [d:\builds\seamonkey\mozilla\layout\html\table\s
rc\nsTableFrame.cpp  line 1127]

nsTableFrame::ReflowMappedChildren
[d:\builds\seamonkey\mozilla\layout\html\table\src\nsTableFrame.cpp  line 3086]

nsTableFrame::ResizeReflowPass2
[d:\builds\seamonkey\mozilla\layout\html\table\src\nsTableFrame.cpp  line 2066]

nsTableFrame::Reflow
[d:\builds\seamonkey\mozilla\layout\html\table\src\nsTableFrame.cpp  line 1710]

See http://www.mozilla.org/projects/seamonkey/reports/ns6analysis.html for lots
of data on this crash.
Looks like either a bad or null frame is being passed in to the function and it
is crashing here:

http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/layout/html/table/src/nsTableFrame.cpp&rev=3.396&mark=1125#1115
There are also one stack in the talkback data listed under 0x00000000 as the
stack signature (from build 2000091409), where the stack is just:

0x00000000
(everything I pasted in above)
Many of the user comments in the talkback reports mention printing, particularly
printing in mail.
Summary: crashes in nsTableFrame::GetRowGroupFrame → crashes while printing [ @ nsTableFrame::GetRowGroupFrame]
Marking nsbeta3+.
Status: NEW → ASSIGNED
Whiteboard: [nsbeta3+]
Adding me to CC. This bug is easy to reproduce. Just print out this page.
Crashes for me with WinNT 2000092120. In the crash analysis somebody mentioned
that he crashed while printing out a bug. (And did not file a bug????) The print
out appeared at our network printer completely. It seems that after a printout
the redraw of the page fails.
I've added a patch to fix the bug, I think. I can't test it right now because my 
build has some other problems. There may be other problems later, but the one 
that dbaron pointed out should be fixed.

Don, we need to find out why this only seems to happen when printing.
Whiteboard: [nsbeta3+] → [nsbeta3+]fix-in-hand-maybe
I applied chris's patch. It does not work, now the second argument is a null 
pointer.
adding [@ nsTableFrame::GetRowGroupFrame] for topcrash tracking.
Summary: crashes while printing [ @ nsTableFrame::GetRowGroupFrame] → crashes while printing [@ nsTableFrame::GetRowGroupFrame]
Bernd, I'm not seeing a crash in GetRowGroupFrame (on 
http://bugzilla.mozilla.org/enter_bug.cgi?product=Browser) after applying the 
patch on my 9/22 build (besides, there is already a check for null on the 2nd 
parameter). I do get a crash in other code after printing when the mouse goes 
into the browser. dcone mentioned that there is a recent bug assigned to joki in 
which mousing over a text field after printing causes a crash. I'm CCing him and 
we should probably wait for him to fix that bug before proceeding with this one. 
I am seeing the crash on the following URL

http://bugzilla.mozilla.org/show_bug.cgi?id=53969

The crash happens at 
http://lxr.mozilla.org/seamonkey/source/layout/html/table/src/nsTableFrame.cpp#1
125
Inside the handling routine for IFrameTypeIn.
The problem is: the virtual function pointer of aframe is null. Calling the 
GetFrameType method produces the crash.

Don, I just tried this again on my 9/22 build and am getting a serious assertion 
(but only after moving the mouse into the browser area after printing this 
page). The stack below is different than what Bernd reported, which supports the 
theory that there may be some evil corruption going on (or as Kevin suggests, 
maybe references to frames are being held after the frames are destroyed). In 
the assertion below, the content does not have a document. I continued past all 
of the assertions and never saw table code on the stack. I think this is either 
your problem or joki's. 



nsDebug::Assertion(const char * 0x01478a34, const char * 0x01478a30, const char 
* 0x01478a00, int 4170) line 256 + 13 bytes
nsDebug::WarnIfFalse(const char * 0x01478a34, const char * 0x01478a30, const 
char * 0x01478a00, int 4170) line 358 + 21 bytes
nsTextFrame::Reflow(nsTextFrame * const 0x00e0c66c, nsIPresContext * 0x017a9790, 
nsHTMLReflowMetrics & {...}, const nsHTMLReflowState & {...}, unsigned int & 
1228400) line 4170 + 38 bytes
nsLineLayout::ReflowFrame(nsIFrame * 0x00e0c66c, nsIFrame * * 0x0012c8a8, 
unsigned int & 1228400, nsHTMLReflowMetrics * 0x00000000, int & 0) line 921
Assignee: karnaze → dcone
Status: ASSIGNED → NEW
I'm upgrading the priority of this bug because currently, PDT is moving all
P3-P5 nsbeta3+ bugs to nsbeta3- and nominate for rtm if needed.  This bug sounds
like it should be addressed before ship.

Do we think that this bugs needs to be fixed for nsbeta3?  It sounds like under
a certain case, we crash when printing?
Priority: P3 → P2
Not holding PR3 for this. While I trust "topcrash", I've printed a ton of mail 
and bug reports and not crashed. Marking nsbeta3- but adding rtm keyword
Keywords: rtm
Whiteboard: [nsbeta3+]fix-in-hand-maybe → [nsbeta3-]fix-in-hand-maybe
This is a duplicate of bug 53219.. has the same stack trace.. 


*** This bug has been marked as a duplicate of 53219 ***
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → DUPLICATE
Whiteboard: [nsbeta3-]fix-in-hand-maybe → [nsbeta3-]
How is this the same stack trace?
when me an Karnaze looked at this.. when he would move the cursor over the 
textfield.. we got the same stacktrace that was in the other bug.  The 
stacktraces to vary depending on what you do after the print.. but the common 
thing is that if you print and do nothing (don't move the cursor over the page), 
no crash.. if you start moving the cursor over the page.. all kinds of bad 
things happen.  Joki has this bug .. I am going to go back to previous builds 
and find out when this regression was introduced... 
So is this a dup of #53219?
*** Bug 54068 has been marked as a duplicate of this bug. ***
This bug isn't a duplicate of bug 53912 because this crash still occurs after
fixing that bug.

Re-opening...
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
I am really impressed by the PDT ignorance to this bug, basing the judgement on 
private impression instead of facts (there are only 27 bugs listed in todays 
crash data), but it is your product and your reputation.
Adding rtm+, waiting for PDT approval.

Bernd, thanks for reminding (the rest of) us that this is important.
Whiteboard: [nsbeta3-] → [nsbeta3-][rtm+]
This is the same stack trace as 53967.  Something unstable happened.. and it was 
introduced between 9/19/00 and 9/20/00.  My regression builds point to this time 
frame when printing then moving the cursor over the page causes crashes.. this 
is a duplicate of that bug.  

*** This bug has been marked as a duplicate of 53967 ***
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → DUPLICATE
Sorry wrong bug number.. reopening this.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
The simplified test case is just a <input value=foo type=text> inside a 
document. Print the document and then put the mouse inside the text control and 
the fun begins. 
Thanks Chris...
Assignee: dcone → waterson
Status: REOPENED → NEW
Summary: crashes while printing [@ nsTableFrame::GetRowGroupFrame] → crashes after printing and moving the mouse into a text frame
Status: NEW → ASSIGNED
Here is one way to fix the problem. Instead of storing the
nsIAnonymousContentCreator-generated content in the document (well, the binding
manager, which is 1-1 with the document), we store it in the pres shell. Note
that nsBindingManager::ChangeDocumentFor() now iterates through aOldDocument's
shells to update each shell's anonymous content.
This is P1. You crash after every form you try to print.
Priority: P2 → P1
PDT agrees [rtm need info] until code reviews are available. Definitely want a
fix for this topcrash though.
Whiteboard: [nsbeta3-][rtm+] → [nsbeta3-][rtm need info]
If this bug is not TABLE specific, please change to appropriate component.
Component: HTMLTables → Layout
The changes make sense to me.  It seems a little strange to be going through the 
pres shells in nsBindingManager::ChangeDocumentFor, but then again, it means the 
code only has to be in one place (rather than two).

Is it possible for some of a document's pres shells to be destroyed before the 
document is?  If so, should you be setting the anonymous content to null in the 
pres shell destructor too?  (Should the setting to null then be done through a 
method on the pres shell that makes sure it's only done once?)

(While we're here, is |nsIAnonymousContentCreator| deprecated?  Should it be 
marked as such in layout/html/base/src/nsIAnonymousContentCreator.h?  And should 
we remove my |SetDocumentForAnonymousContent| hack plus the one remaining call to 
it in nsDocument.cpp that should have been removed with the fix for bug 50999?)
> Is it possible for some of a document's pres shells to be destroyed before
> the document is?

Yes, and this patch doesn't account for that, I don't think. For example, let's
print the document. Now we'll create a PresShell for printing, insert some
anonymous content in it, and then kill the PresShell. At no time will we
ChangeDocumentFor(...) with a null document.

> If so, should you be setting the anonymous content to null in the
> pres shell destructor too? (Should the setting to null then be done
> through a method on the pres shell that makes sure it's only done once?)

I'll give that a try.

I'll think about the rest of your question (removing the
nsIAnonymousContentCreator hacks) tomorrow... :-)
The above patch incorporates dbaron's feedback:

1. nsPresShell::~nsPresShell now enumerates the anonymous content left in the
table and ensures that SetDocument(nsnull) is called on each. N.B. that this may
duplicate some of the work done by ChangeDocumentFor() when the document is
destroyed; however, if an element's mDocument is already null, the SetDocument()
implementation's bail early (so we're basically wasting an extra function call).
Nevertheless, this is necessary to handle the case where a PresShell dies before
the document does: in theory, the PresShell may have triggered anonymous content
to be generated with a script object.

2. We no longer need the SetDocumentForAnonymousContent() method on
nsIAnonymousContentCreator, so I removed it, and the one call made by it.

dbaron, care to give this another look? The patch is pretty much the same as
last time, modulo addition of ClearDocumentEnumerator() in nsPresShell.cpp and
gutting of nsIAnonymousContentCreator::SetDocumentForAnonymousContent().
*** Bug 55130 has been marked as a duplicate of this bug. ***
Am I correct that this solution means all anonymous content should get a 
SetDocument(null) twice (once when the pres shell owning the frame owning the 
content goes away, and once when the "owning" content gets SetDocument(null))?  
Keeping all anonymous content around until the pres shell goes away seems like 
it could be bad for memory usage, but I don't see a better way (and I've 
proposed that we should do something similar for other content...).

Is it possible for there to be problems with 
content->[JS]->document->presShell->content circular references?  (Would the 
added code *ever* break such circular references where it didn't before?  i.e., 
was my previous "pres shell destructor" suggestion bad?)  Who owns the pres 
shell?  Would it be better to do the pres shell's half of the SetDocument(null) 
work only when the document releases the pres shell instead of from the pres 
shell's dtor?
My comment about circular references doesn't make much sense -- if the document 
still owns the pres shell then we're fine, because the iteration in the 
content's half of the work should get it.  But would it be better to be more 
tolerant of leaked pres shells (i.e., not leak ten times more along with it...)?
Does that fix also solve the original problem reported by dbaron? Or did 
this bug morph underneath? See bug 55162 for another manifestation of the 
original bug.
I don't think this bug has morphed.  The problem causing the crash (based on 
what Chris Waterson's fix does) was in the way we were unrooting (root meaning a 
root for JS GC) anonymous content that was attached to frames (which is needed 
to prevent huge leaks - see bug 45676 and bug 42895, although hyatt's 
more recent changes to create event.originalTarget may have "fixed" those cases 
by not causing JS objects to be created for anonymous content in the tooltip 
code).  My original fix (for bug 45676) was very slow (and also didn't account 
for reframes at all) -- see bug 50999.  Waterson's fix for bug 50999 just needs 
to be modified to correctly account for multiple sets of frames (each set owned 
by a PresShell) on the same document.

These problems are caused by attaching anonymous content to frames (any frame 
implementing nsIAnonymousContentCreator does this).  Things get somewhat messy 
once you start creating multiple frame trees (e.g., for printing).  That's why I 
mentioned deprecating nsIAnonymousContentCreator above (in favor of XBL 
anonymous content).

Hopefully that explanation is at least related to the truth...
> Am I correct that this solution means all anonymous content should get a
> SetDocument(null) twice (once when the pres shell owning the frame owning the
> content goes away, and once when the "owning" content gets SetDocument(null))?
This will only happen in the "longest lived" pres shell. (Of course, that means
it will happen in every pres shell that's used to view a web page!) But, as I
noted earlier, SetDocument() implementations will short circuit much of the work
if mDocument == aDocument, so the second iteration will waste a traversal of a
shallow DOM tree, but not much else.

> Keeping all anonymous content around until the pres shell goes away
> seems like it could be bad for memory usage,

Perhaps we can explore bounding the lifetime of the anonymous content to the
lifetime of the frame that created it in the future.

> But would it be better to be more tolerant of leaked pres
> shells (i.e., not leak ten times more along with it...)?

One problem, one bug, please! :-)

> Does that fix also solve the original problem reported by dbaron? Or
> did this bug morph underneath?

No, this doesn't fix the table crasher. Yes, the bug has morphed, but is still
valid. I think that bug 55162 should be dup'd to bug 54829, which covers the bad
table row group dereference.

>This will only happen in the "longest lived" pres shell. (Of course, that means

Right...

>> But would it be better to be more tolerant of leaked pres
>> shells (i.e., not leak ten times more along with it...)?
>
>One problem, one bug, please! :-)

What I meant was that it might be better to put the code you were adding to the 
pres shell's destructor at the place where the document releases the pres shell.
Ah, I see. That's a good idea. I'll work up another patch.
I incorporated dbaron's suggestion to add out-of-band signalling to the
PresShell to nuke the document pointers: there's now
nsIPresShell::ReleaseAnonymousContent() that's called from nsDocument::- and
nsXULDocument::SetScriptGlobalObject(nsnull). This method is implemented such
that it will drop the mAnonymousContentTable after being called, avoiding
multiple traversals through the anonymous content to call SetDocument().
I'm guessing that, for OS/2, |ClearDocumentEnumerator| needs |PR_CALLBACK|, 
i.e., |static PRBool PR_CALLBACK|.

The behavior of |PresShell::SetAnonymousContentFor| when given a null 
|aAnonymousElements| seems odd - you remove the anonymous content from the table 
without calling |SetDocument|.  Perhaps you should either:
 * change |if (oldAnonymousElements && aAnonymousElements)| to |if 
(oldAnonymousElements)|
 * (my preference) make null |aAnonymousElements| an assertion+failure return, 
since you have |ReleaseAnonymousContent| now.

My comments about resistance to leaked |PresShell| don't make any sense at all 
-- if we leak a pres shell we leak the entire document anyway (my guess at the 
ownership model was backwards -- I should have looked).  So I guess some of the 
current changes aren't as useful as I thought, but I don't think there's 
anything wrong with them.  Or does the stuff I got you to add do anything at 
all?  (I'm tired... I can't tell.  Sorry to change my mind so much.)
Fixed the PR_CALLBACK stuff: good catch. mkaply will thank you, even though he
doesn't know it yet.

You're right about allowing null for aAnonymousItems. Bogus. Vestigial junk from
an earlier incarnation. Fixed that.

I actually like the consistency of unrooting the anonymous content at the same
time we unroot real content. Although I'm almost positive that it's probably not
strictly necessary (as indicated in the comment), it certainly feels tidier.

I'll attach another patch.
r=dbaron (assuming you've run some basic leak tests, e.g., bloat tests and a 
little mousing around in the browser)
dbaron: yep.
a=hyatt
PDT, returning to + status. You've ++'d it before. Still want it?
Whiteboard: [nsbeta3-][rtm need info] → [nsbeta3-][rtm+]
PDT marking [rtm need info]. We'd still like a fix for a topcrash (don't think
it was ever ++) but this patch is really big. Please validate this on the trunk
for a day and come back after that was successful.
Whiteboard: [nsbeta3-][rtm+] → [nsbeta3-][rtm need info]
The size of the patch is deceptive.  Most of it is moving code from one place to 
another or removing unused code.
PDT: this is a top crash, and while the patch is long, I can verify what david
says:  "number of lines changed" here is deceptive, because the patch largely
moves functionality from one place to another, and obsoletes some methods which
are correctly removed.  It isn't 0-risk, but the risk is not high either.  After
a day or two of validation on the trunk and maybe some extra testing help from
QA and/or dev staff, we should take this fix on the branch.
I agree with buster, this is a big deal.. printing will not work in 90% of the 
cases if this is not fixed.  
I filed a bunch of printing bugs on the beta and some of
them got funneled into this bug.  We really should fix this
for rtm, I think we look pretty bad without this fix.
Fix checked in to trunk. Marking rtm+ again for re-eval.
Whiteboard: [nsbeta3-][rtm need info] → [nsbeta3-][rtm+] FIXED ON TRUNK
I agree that we need this. I just filed 55910 on a bunch of printing issues, 
which are most likely this bug. For one, after printing, scroll bars are totally 
hosed.
checked into the embedding branch.
marking rtm++
Whiteboard: [nsbeta3-][rtm+] FIXED ON TRUNK → [nsbeta3-][rtm++] FIXED ON TRUNK
fixed on branch
Status: ASSIGNED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Status Whiteboard says 'FIXED ON TRUNK' - comments says fixed on branch. I 
assume sice it is an rtm++ bug, it was fixed on the branch. Please clarify 
prior to verification. Thanks
Both.
Whiteboard: [nsbeta3-][rtm++] FIXED ON TRUNK → [nsbeta3-][rtm++] FIXED ON TRUNK and BRANCH
Verified fixed on Win, Mac and Linux with 10_11 branch build. Added vtrunk 
keyword.
Keywords: vtrunk
I didn't get crash on Win NT nov_1 nor Mac oct_24 nor Linux nov_2 trunk builds
by "printing and moving the mouse into a text frame", printing in mail.
So I'm marking this bug as VERIFIED and removing vtrunk keyword.


Status: RESOLVED → VERIFIED
Keywords: vtrunk
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: