Closed Bug 56713 Opened 25 years ago Closed 25 years ago

Bulletproof common crash in [@ nsXULDocument::ResumeWalk]

Categories

(Core :: XUL, defect, P3)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: jrgmorrison, Assigned: waterson)

Details

(Keywords: crash, topcrash, Whiteboard: [rtm-] r=hyatt, sr=brendan)

Crash Data

Attachments

(3 files)

Talkback is reporting a common crash for this stack trace in current builds. (I don't have any steps to reproduce this, but some comments indicate that this is happening on startup): nsXULDocument::ResumeWalk [d:\builds\seamonkey\mozilla\rdf\content\src\nsXULDocument.cpp, line 5131] nsXULDocument::ParserObserver::OnStopRequest [d:\builds\seamonkey\mozilla\rdf\content\src\nsXULDocument.cpp, line 6474] nsParser::OnStopRequest [d:\builds\seamonkey\mozilla\htmlparser\src\nsParser.cpp, line 2401] nsJARChannel::OnStopRequest [d:\builds\seamonkey\mozilla\netwerk\protocol\jar\src\nsJARChannel.cpp, line 703] nsOnStopRequestEvent::HandleEvent [d:\builds\seamonkey\mozilla\netwerk\base\src\nsAsyncStreamListener.cpp, line 302] nsStreamListenerEvent::HandlePLEvent [d:\builds\seamonkey\mozilla\netwerk\base\src\nsAsyncStreamListener.cpp, line 106] PL_HandleEvent [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 577] PL_ProcessPendingEvents [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 513] _md_EventReceiverProc [d:\builds\seamonkey\mozilla\xpcom\threads\plevent.c, line 1055] nsAppShellService::Run [d:\builds\seamonkey\mozilla\xpfe\appshell\src\nsAppShellService.cpp, line 408] main1 [d:\builds\seamonkey\mozilla\xpfe\bootstrap\nsAppRunner.cpp, line 1038] main [d:\builds\seamonkey\mozilla\xpfe\bootstrap\nsAppRunner.cpp, line 1215] WinMain [d:\builds\seamonkey\mozilla\xpfe\bootstrap\nsAppRunner.cpp, line 1233] WinMainCRTStartup() KERNEL32.DLL + 0x7903 (0x77e57903) The code at that location is this: 5124 // content that we can simply build the delegates 5125 // and attach them to the parent node. 5126 rv = CreateElement(protoele, getter_AddRefs(child)); 5127 if (NS_FAILED(rv)) return rv; 5128 5129 // ...and append it to the content model. 5130 rv = element->AppendChildTo(child, PR_FALSE); <-- Boom! 5131 if (NS_FAILED(rv)) return rv; 5132 On bug 35568, Seth proposed a simple bulletproofing fix to assert and bail out 'if (!element)'. We could probably do the same fix for this crash. Index: nsXULDocument.cpp =================================================================== RCS file: /cvsroot/mozilla/rdf/content/src/nsXULDocument.cpp,v retrieving revision 1.352 diff -u -r1.352 nsXULDocument.cpp --- nsXULDocument.cpp 2000/10/09 03:08:28 1.352 +++ nsXULDocument.cpp 2000/10/15 07:24:59 @@ -5126,6 +5126,9 @@ rv = CreateElement(protoele, getter_AddRefs(child)); if (NS_FAILED(rv)) return rv; + NS_ASSERTION(element,"element is null"); + if (!element) return NS_ERROR_FAILURE; + // ...and append it to the content model. rv = element->AppendChildTo(child, PR_FALSE); if (NS_FAILED(rv)) return rv; (or, there was some discussion on that other bug that perhaps this should be done earlier, at the point where mContextStack.Peek fails to get the element.) Nominate for RTM to avoid what Talkback says is a common crash.
Keywords: crash, rtm
Hrm. There are only two calls to mContextStack.Push(), and the only one that would allow a null element to be inserted into the stack (well, w/o some violent error occuring) is at line 5003, in nsXULDocument::PrepareToWalk(). Need to look at this a bit more and try to deduce what's going wrong...
adding topcrash keyword and [@ nsXULDocument::ResumeWalk] for tracking.
Keywords: topcrash
putting on [rtm need info] radar. It looks to be the #1 crash now. However the last build crash date was 10/13 so this might have been fixed. I've seen crashes here in the past when my xul/js/dtd files are messed up. One of the reports mentions choosing region in the profile manager.
Whiteboard: [rtm need info]
There are some (7) crashes for builds of the 18th and 19th, although they are from the same two users (3 for one, 4 for the other). In once case, they were using a package called "ultraclassic", and in the other, it only says "starting app". Would it be a bad thing to just do the null check on the branch, and leave the trunk alone (until there's a better idea of what circumstances can really cause this to happen).
Attached patch bulletproofingSplinter Review
It turns out that the bulletproofing that sspitzer and I put in for the text node stuff caused this crash :-(. I've attached a "diff -wu" that puts "good" bulletproofing in for the text node case, and should fix the bug. By the way, this bug is caused by removing entities missing in the DTD file (e.g., remove "browserCmd.label" from navigator.dtd). Here's what happens. - When we're parsing an overlay that's missing a DTD, the parser generates a handy little error file in XML, which we try to parse as XUL. The XML file looks something like <parsererror> Your file has an error blah blah blah <sourcetext> - Since we're parsing in overlay mode, we've got no place to hook up the "Your file has an error blah blah blah" text. We trip the old bulletproofing code an immediately return an error code out to the caller. But, we do this *without* cleaning up mContextStack! - The next overlay that we try to load pushes it's "root" element onto the stack, and we wind up getting confused, thinking we actually should have a "real" element to hook up content to. The fix is to apply the same criteria to text node hookup that we do to element hookup: we can only hook up a node if we're a) parsing the master document, which'll always have a root element, or b) parsing an overlay, and have already passed the "first ply" of elements.
hyatt, please r=; brendan, sr=. thanks!
r=hyatt
sr=brendan@mozilla.org on the code. I asked for a comments-only change to make the inductive proof clear. I think a new patch should be attached just for the record. If the PDT doesn't care about the comments, the current patch can go into branch and trunk, and the trunk can get comment improvements right away in a separate revision. /be
Fix checked in on trunk.
Status: NEW → ASSIGNED
Whiteboard: [rtm need info] → [rtm+] r=hyatt, sr=brendan
Curious question: In the final patch, there is an assertion added towards the middle/end of the patch. In the deleted code, the same assertion was "backed up" by a non-debug test that would return. If you thought the assertion was significant enough to add, why don't you back-stop the assertion now? ...or is there inline code doing this (outside the diff?) or is there no nice way to eject? Since this is a "bullet proofing at the last minute," it seems like it might be reasonable to put in such a back stop (if only on the branch?). Again, I'm just curious, and *not* insisting on a change to the patch.
I added the assertion back in my patch on 10/27/00 after discussing w/ brendan. The situation is this. I believe that we can prove that the code, *as it stands now*, will never trigger that assertion: the assert and panic return in PrepareToWalk() provides the "basis case" for mContextStack being in a correct state; we can do induction from there. So...the assertion is left in the code for "the next guy": someone who comes and twiddles with mContextStack and possibly violates this condition. But...we can put the belt-and-braces hard return back in if that would make you feel better.
This bug is in candidate limbo. We will reconsider this fix once we have a candidate in hand, but we can't take this fix before then. Please check into the trunk ASAP.
Backing off this fix. It's probably wonderful, but the topcrash isn't showing up anymore, so we'll just leave it on the trunk for now.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Whiteboard: [rtm+] r=hyatt, sr=brendan → [rtm-] r=hyatt, sr=brendan
Component: XP Toolkit/Widgets: XUL → XUL
QA Contact: jrgmorrison → xptoolkit.widgets
Crash Signature: [@ nsXULDocument::ResumeWalk]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: