Closed Bug 333050 Opened 18 years ago Closed 13 years ago

intermittent crashes going back to tinderbox page after using popup iframe [@ nsDocLoader::QueryInterface]

Categories

(Core :: DOM: Navigation, defect)

x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
mozilla1.8.1beta2

People

(Reporter: dbaron, Unassigned)

References

()

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

I've been seeing intermittent crashes for months (since around when fastback landed, or stabilized) while using tinderbox.  In particular, the crash usually happens after I:
 1. load tinderbox
 2. click on one of the things (name / "L") that causes a popup in an iframe
 3. click on one of the links in that popup
 4. go back
although sometimes (as in the case I'm debugging now), the steps seem to be:
 1-2. same as above
 3. let page meta-refresh, probably after closing popup
The steps may require that having the same tinderbox page loaded in multiple windows (independently meta-refreshing).

I have not yet found steps to reliably reproduce this, but I've been seeing it for months, and it's clearly not getting fixed, so I'm thinking some code inspection may be in order.

(More data coming, in the form of attachments, which will take me some time to produce.)
(In reply to comment #0)
> although sometimes (as in the case I'm debugging now), the steps seem to be:
>  1-2. same as above
>  3. let page meta-refresh, probably after closing popup

Ignore this bit -- I was misinterpreting the log.  And my memory is certainly that I usually crash when going back from a bonsai page (reached through a who.cgi popup) to a tinderbox page.
That has the earmarks of us calling QI on a deleted docshell, which would be Very Bad (like "probably exploitable" bad).  :(

bryner, are you going to have a chance to look at this in time for 1.8.0.3?
Severity: normal → critical
Flags: blocking1.9a1?
Flags: blocking1.8.1?
Flags: blocking1.8.0.3?
FWIW, this is quite hard to reproduce.  I probably do these actions a few times a day on average, and I crash this way about once every week or two.
(But when I say I crash once every week or two, that's probably over 50% of the total crashes I experience when not testing other people's crash bugs.)
Flags: blocking1.8.0.3? → blocking1.8.0.3+
>  2. click on one of the things (name / "L") that causes a popup in an iframe

For me, the name popups are iframes, but the L popups are just absolutely positioned divs.
Keywords: crash
See also bug 319551, which (sometimes?) has the same stack signature, and is easier to reproduce (at least for me).
Depends on: 319551
I think the trick to reproducing this crash is to click on the tinderbox page to close the popup as the bonsai page is loading.
I just crashed in a slightly different way (from DocumentViewerImpl::Destroy) trying to reproduce this -- I think because the tinderbox page didn't go in the fastback cache for some reason:

#5  <signal handler called>
#6  0x00010001 in ?? ()
#7  0x03ac64c7 in ns_if_addref<nsIDocShellTreeItem*> (expr=0x90815c4)
    at ../../../dist/include/xpcom/nsISupportsUtils.h:114
#8  0x03af6176 in nsSHEntry::ChildShellAt (this=0x8e2d310, aIndex=0,
    aShell=0xbffc6724)
    at /builds/reflow/mozilla/docshell/shistory/src/nsSHEntry.cpp:574
#9  0x0728cd47 in DocumentViewerImpl::Destroy (this=0x8c1f5f0)
    at /builds/reflow/mozilla/layout/base/nsDocumentViewer.cpp:1502
#10 0x0728b2f7 in DocumentViewerImpl::Show (this=0x8e7e298)
    at /builds/reflow/mozilla/layout/base/nsDocumentViewer.cpp:1828
#11 0x072a63f1 in nsPresContext::EnsureVisible (this=0x8f84740,
    aUnsuppressFocus=0)
    at /builds/reflow/mozilla/layout/base/nsPresContext.cpp:1319
#12 0x072b4e0a in PresShell::UnsuppressAndInvalidate (this=0x907a158)
    at /builds/reflow/mozilla/layout/base/nsPresShell.cpp:4393
#13 0x072b5023 in PresShell::UnsuppressPainting (this=0x907a158)
    at /builds/reflow/mozilla/layout/base/nsPresShell.cpp:4441
#14 0x072aa798 in PresShell::sPaintSuppressionCallback (aTimer=0x8f42878,
    aPresShell=0x907a158)
    at /builds/reflow/mozilla/layout/base/nsPresShell.cpp:2574
#15 0x0021f9fd in nsTimerImpl::Fire (this=0x8f42878)
    at /builds/reflow/mozilla/xpcom/threads/nsTimerImpl.cpp:400
...<the usual>

Why is nsSHEntry::mChildShells holding weak pointers rather than refcounting?  Especially given the comment in DocumentViewerImpl::Destroy:
    // Do the same for our children.  Note that we need to get the child
    // docshells from the SHEntry now; the docshell will have cleared them.
So, in the reflow branch tree where I see that (which is more likely because of the branchpoint), when I make the obvious change to nsCOMArray, I actually end up fixing the immediate crash, but after going back, getting into a weird state, and reloading, I crash in nsGlobalWindow::ClearAllTimeouts.
Doesn't look like this makes the current release, bumping to the next.
Flags: blocking1.8.0.5?
Flags: blocking1.8.0.4-
Flags: blocking1.8.0.4+
Assignee: nobody → bryner
Flags: blocking1.8.0.5? → blocking1.8.0.5+
FWIW, just hit this on a trunk build from within the past few days; hadn't seen it for a while, but I think I'd been pretty careful to avoid keeping tinderbox windows up.  This time it was just while loading the bonsai query, not while going back to the tinderbox page.  But perhaps I'd previously gone back to get to the tinderbox page from which I was getting the bonsai query.
I do have one idea here -- RestoreWindowState, which is called before we reattach the child docshells, has the potential to fire events.  If those events removed the child document, then we'd end up with a dangling pointer.
No longer depends on: 319551
Isn't going to make the 1.8.0.5 train at this point, not fixed on trunk or 1.8 for regression testing. Not going to bother bumping to the next .0.x release.
Flags: blocking1.8.0.5+ → blocking1.8.0.5-
We really do want to make sure we get this in on 1.8.0.x once we have a fix... dangling pointers are bad.
Flags: blocking1.8.0.6?
No longer depends on: 319551
Bryner do you have any time to take a look at this for 1.8.1
Flags: blocking1.8.1? → blocking1.8.1+
I can try, I've just never been able to reproduce it.  The patch from bug 319551 may solve the immediate crash but there's still something bad happening.
Target Milestone: --- → mozilla1.8.1beta2
Between the 2006-06-27-04-trunk and 2006-06-29-04-trunk build the behavior changed:  in the latter build, the popup automatically goes away when the new page starts loading.

I can reproduce the crash pretty easily in 2006-06-26-04-trunk (without the patch from bug 319551).  In 2006-06-27-04-trunk (with the patch, but without the behavior change), I don't crash, but the browser goes into a state where it's just spinning in a loading state showing the old (forward) page when going back (easy to get out of, though, since hitting stop shows the page I went back to).  In 2006-06-29-04-trunk (with the behavior change), I don't see either problem.

(And note that comment 0 was incorrect in mentioning the "L" popups as Jesse pointed out in comment 7; it's only the name popups that cause the problem.)
Filed bug 343169 on the latter regression.
Changing to blocking1.8.1- since it shouldn't be crashing anymore on the branch (although I'll try to test that tonight).
Flags: blocking1.8.1+ → blocking1.8.1-
Not making 1.8.0.7, no patch, maybe WFM now.
Flags: blocking1.8.0.7? → blocking1.8.0.7-
Flags: blocking1.9a1? → blocking1.9-
Keywords: crashregression
Whiteboard: [wanted-1.9]
Flags: wanted1.9+
Whiteboard: [wanted-1.9]
Reassigning my bugs, since I'm not actually working on them.
Assignee: bryner → nobody
per crash-stats none currently on trunk (brief spike at 3/24-25).
but there are some crashes on branch 2.0.0.14 (not a topcrash)

TB44916586 for example
Stack Signature	 nsDocLoader::QueryInterface 90983893
Product ID	Firefox2
Build ID	2008020121
Trigger Time	2008-05-08 17:23:47.0
Platform	Win32
Operating System	Windows NT 5.1 build 2600
Module	FIREFOX.EXE + (0038a427)
URL visited	
User Comments	
Since Last Crash	20387 sec
Total Uptime	78957 sec
Trigger Reason	Access violation
Source File, Line No.	c:/builds/tinderbox/Fx-Mozilla1.8-Release/WINNT_5.2_Depend/mozilla/uriloader/base/nsDocLoader.cpp, line 230
Stack Trace 	
nsDocLoader::QueryInterface  [mozilla/uriloader/base/nsDocLoader.cpp, line 230]
nsDocShell::QueryInterface  [mozilla/docshell/base/nsDocShell.cpp, line 404]
nsWebShell::QueryInterface  [mozilla/docshell/base/nsWebShell.cpp, line 236]
nsQueryInterfaceWithError::operator()  [mozilla/xpcom/build/nsCOMPtr.cpp, line 69]
nsGetInterface::operator()  [mozilla/xpcom/build/nsIInterfaceRequestorUtils.cpp, line 52]
nsCOMPtr_base::assign_from_helper  [mozilla/xpcom/build/nsCOMPtr.cpp, line 150]
nsGlobalWindow::GetTop  [mozilla/dom/src/base/nsGlobalWindow.cpp, line 2080]
XPTC_InvokeByIndex  [mozilla/xpcom/reflect/xptcall/src/md/win32/xptcinvoke.cpp, line 102]
XPCWrappedNative::CallMethod  [mozilla/js/src/xpconnect/src/xpcwrappednative.cpp, line 2169]
XPC_WN_GetterSetter  [mozilla/js/src/xpconnect/src/xpcwrappednativejsops.cpp, line 1487]
js_Invoke  [mozilla/js/src/jsinterp.c, line 1379]
js_InternalInvoke  [mozilla/js/src/jsinterp.c, line 1473]
js_InternalGetOrSet  [mozilla/js/src/jsinterp.c, line 1544]
js_NativeGet  [mozilla/js/src/jsobj.c, line 3469]
js_Interpret  [mozilla/js/src/jsinterp.c, line 4036]
js_Execute  [mozilla/js/src/jsinterp.c, line 1638]
JS_EvaluateUCScriptForPrincipals  [mozilla/js/src/jsapi.c, line 4298]
nsJSContext::EvaluateString  [mozilla/dom/src/base/nsJSEnvironment.cpp, line 1100]
nsScriptLoader::EvaluateScript  [mozilla/content/base/src/nsScriptLoader.cpp, line 813]
nsScriptLoader::ProcessRequest  [mozilla/content/base/src/nsScriptLoader.cpp, line 711]
nsScriptLoader::DoProcessScriptElement  [mozilla/content/base/src/nsScriptLoader.cpp, line 644]
nsScriptLoader::ProcessScriptElement  [mozilla/content/base/src/nsScriptLoader.cpp, line 396]
nsHTMLScriptElement::MaybeProcessScript  [mozilla/content/html/content/src/nsHTMLScriptElement.cpp, line 663]
nsHTMLScriptElement::BindToTree  [mozilla/content/html/content/src/nsHTMLScriptElement.cpp, line 456]
nsGenericElement::AppendChildTo  [mozilla/content/base/src/nsGenericElement.cpp, line 2876]
HTMLContentSink::ProcessSCRIPTTag  [mozilla/content/html/document/src/nsHTMLContentSink.cpp, line 4177]
HTMLContentSink::AddLeaf  [mozilla/content/html/document/src/nsHTMLContentSink.cpp, line 3043]
CNavDTD::AddLeaf  [mozilla/parser/htmlparser/src/CNavDTD.cpp, line 3579]
CNavDTD::HandleDefaultStartToken  [mozilla/parser/htmlparser/src/CNavDTD.cpp, line 1283]
CNavDTD::HandleStartToken  [mozilla/parser/htmlparser/src/CNavDTD.cpp, line 1668]
CNavDTD::HandleToken  [mozilla/parser/htmlparser/src/CNavDTD.cpp, line 955]
CNavDTD::BuildModel  [mozilla/parser/htmlparser/src/CNavDTD.cpp, line 458]
nsParser::BuildModel  [mozilla/parser/htmlparser/src/nsParser.cpp, line 2169]


TB44575343
Stack Signature	 nsDocLoader::QueryInterface f556741e
Product ID	Firefox2
Build ID	2008040413
Trigger Time	2008-04-30 14:42:41.0
Platform	Win32
Operating System	Windows NT 5.1 build 2600
Module	FIREFOX.EXE + (0038b4fb)
URL visited	
User Comments	
Since Last Crash	0 sec
Total Uptime	68678 sec
Trigger Reason	Access violation
Source File, Line No.	c:/builds/tinderbox/Fx-Mozilla1.8-Release/WINNT_5.2_Depend/mozilla/uriloader/base/nsDocLoader.cpp, line 236
Stack Trace 	
nsDocLoader::QueryInterface  [mozilla/uriloader/base/nsDocLoader.cpp, line 236]
nsDocLoader::GetAsDocLoader  [mozilla/uriloader/base/nsDocLoader.cpp, line 273]
nsDocLoader::AddDocLoaderAsChildOfRoot  [mozilla/uriloader/base/nsDocLoader.cpp, line 285]
nsDocShell::Init  [mozilla/docshell/base/nsDocShell.cpp, line 348]
nsWebShellConstructor  [mozilla/docshell/build/nsDocShellModule.cpp, line 89]
CallCreateInstance  [mozilla/xpcom/build/nsComponentManagerUtils.cpp, line 171]
nsWebShellWindow::Initialize  [mozilla/xpfe/appshell/src/nsWebShellWindow.cpp, line 229]
nsAppShellService::JustCreateTopWindow  [mozilla/xpfe/appshell/src/nsAppShellService.cpp, line 361]
nsAppShellService::CreateHiddenWindow  [mozilla/xpfe/appshell/src/nsAppShellService.cpp, line 177]
nsAppStartup::CreateHiddenWindow  [mozilla/toolkit/components/startup/src/nsAppStartup.cpp, line 141]
XRE_main  [mozilla/toolkit/xre/nsAppRunner.cpp, line 2695]
main  [mozilla/browser/app/nsBrowserApp.cpp, line 61]
kernel32.dll + 0x16fd7 (0x7c816fd7)

Component: History: Session → Document Navigation
QA Contact: history.session → docshell
dbaron, are you still hitting this?  If not, should we close the bug?
crash keyword missing
Keywords: crash
Crash Signature: [@ nsDocLoader::QueryInterface]
I don't see this anymore. Resolving as Works For Me. There is another similar signature and Marcia will log that as a new bug.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: