Closed Bug 679626 Opened 8 years ago Closed 7 years ago

crash in nsIDOMWindowInternal::GetDocShell with signature pointing to nsASDOMWindowEnumerator::GetNext. ref-counting issue?

Categories

(Core :: Widget, defect, critical)

x86
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla18
Tracking Status
firefox-esr17 + fixed

People

(Reporter: Usul, Assigned: Bienvenu)

References

()

Details

(Keywords: crash, regression, topcrash+, Whiteboard: [tbird topcrash][gs][v6 regression])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-a302892d-e96d-46ac-943d-5f7fb2110813 .
============================================================= 
0 	xul.dll 	nsASDOMWindowEnumerator::GetNext 	xpfe/appshell/src/nsAppShellWindowEnumerator.cpp:261
1 	xul.dll 	MarkWindowList 	content/base/src/nsCCUncollectableMarker.cpp:177
2 	xul.dll 	nsCCUncollectableMarker::Observe 	content/base/src/nsCCUncollectableMarker.cpp:224
3 	xul.dll 	nsObserverList::NotifyObservers 	xpcom/ds/nsObserverList.cpp:130
4 	xul.dll 	nsObserverService::NotifyObservers 	xpcom/ds/nsObserverService.cpp:182
5 	xul.dll 	nsCycleCollector::PrepareForCollection 	xpcom/base/nsCycleCollector.cpp:2524
6 	xul.dll 	nsCycleCollectorRunner::Collect 	xpcom/base/nsCycleCollector.cpp:3371
7 	xul.dll 	nsCycleCollector_collect 	xpcom/base/nsCycleCollector.cpp:3451
8 	xul.dll 	nsJSContext::CycleCollectNow 	dom/base/nsJSEnvironment.cpp:3277
9 	xul.dll 	CCTimerFired 	dom/base/nsJSEnvironment.cpp:3317
10 	xul.dll 	nsTimerImpl::Fire 	xpcom/threads/nsTimerImpl.cpp:424
11 	xul.dll 	nsTimerEvent::Run 	xpcom/threads/nsTimerImpl.cpp:520
12 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:618
13 	xul.dll 	NS_ProcessNextEvent_P 	objdir-tb/mozilla/xpcom/build/nsThreadUtils.cpp:245
14 	xul.dll 	mozilla::ipc::MessagePump::Run 	ipc/glue/MessagePump.cpp:134
15 	xul.dll 	MessageLoop::RunInternal 	ipc/chromium/src/base/message_loop.cc:218
16 	xul.dll 	MessageLoop::RunHandler 	ipc/chromium/src/base/message_loop.cc:202
17 	xul.dll 	MessageLoop::Run 	ipc/chromium/src/base/message_loop.cc:176
18 	xul.dll 	nsBaseAppShell::Run 	widget/src/xpwidgets/nsBaseAppShell.cpp:189
19 	xul.dll 	nsAppShell::Run 	widget/src/windows/nsAppShell.cpp:249
20 	xul.dll 	nsAppStartup::Run 	toolkit/components/startup/nsAppStartup.cpp:222
21 	xul.dll 	XRE_main 	toolkit/xre/nsAppRunner.cpp:3686
22 	thunderbird.exe 	NS_internal_main 	mail/app/nsMailApp.cpp:104
23 	thunderbird.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:106
24 	thunderbird.exe 	__tmainCRTStartup 	objdir-tb/mozilla/memory/jemalloc/crtsrc/crtexe.c:591
25 	kernel32.dll 	BaseThreadInitThunk 	
26 	ntdll.dll 	__RtlUserThreadStart 	
27 	ntdll.dll 	_RtlUserThreadStart
Product: Thunderbird → Core
QA Contact: general → general
This has been around since at least version 3.0
EXCEPTION_ACCESS_VIOLATION_READ
0x0

about 1/3 of Thunderbird and Firefox crashes are startup.

example Firefox 7.0a bp-61cfaa98-af66-410a-aca5-ed65d2110801
example rare Mac crash bp-f1e7b34d-ad2b-4801-a4c2-52dca2110106
Crash Signature: [@ nsASDOMWindowEnumerator::GetNext(nsISupports**)] → [@ nsASDOMWindowEnumerator::GetNext(nsISupports**)] [@ @0x0 | nsASDOMWindowEnumerator::GetNext(nsISupports**)]
OS: Mac OS X → All
Whiteboard: [tbird crash]
xref 
Bug 616506 - crash [@ nsAppShell::ProcessNextNativeEvent ] on ARM 
Bug 574727 - Crash in [@ nsAppShell::ProcessNextNativeEvent ] (linux)
Bug 524971 - Crash Report [@ nsAppShell::ProcessNextNativeEvent(int)]
Component: General → Widget
QA Contact: general → general
#3 crash for thunderbird version 6. And, it would seem, something regressive occurred for Thunderbird, because it went from obscurity in v5 to topcrash in v6


The exact opposite is true for Firefox - this crash almost doesn't exist in FF5 and 6
Keywords: regression
Whiteboard: [tbird crash] → [tbird topcrash]
http://getsatisfaction.com/mozilla_messaging/topics/version_6_0_crashing echoes this topcrash starting in version 6
Whiteboard: [tbird topcrash] → [tbird topcrash][gs]
ludo, bienvenu, did we have a theory as to what this might be?  
I thought we had an idea, but there's nothing documented here.

The uptick in crash rate might be in the time frame of 20110714224222 builds [1]. 

Is it just coincidence that Bug 681632 is also has a large uptick in version 6?


https://crash-stats.mozilla.com/report/list?product=Thunderbird&query_search=signature&query_type=exact&query=nsASDOMWindowEnumerator%3A%3AGetNext%28nsISupports**%29&reason_type=contains&date=8%2F14%2F2011%2010%3A43%3A39&range_value=16&range_unit=weeks&hang_type=any&process_type=any&do_query=1&admin=1&signature=nsASDOMWindowEnumerator%3A%3AGetNext%28nsISupports**%29
Could be the cycle collector behavior changed...though the core issue probably is some sort of ref-counting issue.
or they could be the same issue - I suspect that's the case.
#1 cash in Thunderbird v9
Keywords: topcrash
Bienvenu, 

To diagnose this do we need a special build to handle what Ted describes at bug 681632 comment 6?

bug 681632 is almost certainly related - lots of reporters see both crashes.

Unfortunately most crashes happen unattended by the user, or they can't reproduce (crash goes away after a couple). But still #1 crash for recent versions.
Blocks: 681632
(In reply to Wayne Mery (:wsmwk) from comment #9)
> Bienvenu, 
> 
> To diagnose this do we need a special build to handle what Ted describes at
> bug 681632 comment 6?
No, as Ted says, from the code, we can tell it's crashing in nsIDOMWindowInternal::GetDocShell.  We really need some way of reproducing this.
(In reply to David :Bienvenu from comment #10)
> (In reply to Wayne Mery (:wsmwk) from comment #9)
> > Bienvenu, 
> > 
> > To diagnose this do we need a special build to handle what Ted describes at
> > bug 681632 comment 6?
> No, as Ted says, from the code, we can tell it's crashing in
> nsIDOMWindowInternal::GetDocShell. 
My skimming missed this - Thanks for the reminder.

> We really need some way of reproducing this.
still searching for a reporter :(
Blocks: 694646
Summary: crash nsASDOMWindowEnumerator::GetNext → crash in nsIDOMWindowInternal::GetDocShell with signature pointing to nsASDOMWindowEnumerator::GetNext
Whiteboard: [tbird topcrash][gs] → [tbird topcrash][gs][v6 regression]
Blocks: 681921
FWIW, there are no thunderbird or firefox crashes with nsIDOMWindowInternal::GetDocShell as top frame.
Component: Widget → General
Keywords: testcase-wanted
Product: Core → Thunderbird
QA Contact: general → general
Summary: crash in nsIDOMWindowInternal::GetDocShell with signature pointing to nsASDOMWindowEnumerator::GetNext → crash in nsIDOMWindowInternal::GetDocShell with signature pointing to nsASDOMWindowEnumerator::GetNext. ref-counting issue?
recap comment from bienvenu, in a conversation about bp-f203b130-a9be-4c8c-9171-d08592120424 
"I'm not convinced this is pop3 related, from the log and stack traces, tbh. The pop3 protocol object is destroyed, the sockets are closed, and the crash occurs way downstream."
bienvenu, if (perhaps big "if") we find someone who can reliably reproduce, do you think there is a fair chance of getting a regression range?
(In reply to Wayne Mery (:wsmwk) from comment #14)
> bienvenu, if (perhaps big "if") we find someone who can reliably reproduce,
> do you think there is a fair chance of getting a regression range?

Not really, no, and even a regression range might not be helpful. But if we do find someone who can reproduce it reliably, we can certainly drill down on potential causes.
what I would propose then, is that you compose a short message to "reporters of this crash" to ask for badly needed volunteers, outlining what we hope to accomplish and in broad terms what might be asked of them. And then have ludo email it via socorro.
bp-845180fe-c34e-4acd-ad28-ae8a52120120 wrote crashing stopped after disabling lightning.  (OTOH many (most?) don't have lightning installed.)
Keywords: topcrashtopcrash+
Attached patch possible fixSplinter Review
This patch attempts to ignore windows where we couldn't get a doc shell when enumerating the windows. I don't know if this is the right thing to do, but it seems better than crashing...
Assignee: nobody → mozilla
Attachment #644665 - Flags: review?(jonas)
Comment on attachment 644665 [details] [diff] [review]
possible fix

Sorry, I don't really know this code well enough to review. I'm also not a docshell peer.

Maybe try bsmedberg?
Attachment #644665 - Flags: review?(jonas) → review?(benjamin)
Comment on attachment 644665 [details] [diff] [review]
possible fix

So the basic notion here is that GetDOMWindow(mCurrentPosition->mWindow is returning a null window? GetDOMWindow returns an nsresult but we never check it (seems to me it could just return null). It's not clear to me whether it's safe to just skip the error cases or whether we really should be figuring out why the docshell getinterface call is failing.
Attachment #644665 - Flags: review?(benjamin) → review?(bzbarsky)
Comment on attachment 644665 [details] [diff] [review]
possible fix

The simplest reason the getInterface call would fail is that docShell is null, I would think.  Which could just happen if a window is closed, right?

So this looks reasonable to me.
Attachment #644665 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky (:bz) from comment #21)
> The simplest reason the getInterface call would fail is that docShell is
> null, I would think.  Which could just happen if a window is closed, right?

The best I can understand it, I think that could be the case.

I'll land this as David isn't as active at the moment.
Moving to a more appropriate core component, although I'm not quite sure which one that is...
Assignee: mozilla → mbanner
Component: General → Widget
Product: Thunderbird → Core
Checked in:

https://hg.mozilla.org/integration/mozilla-inbound/rev/6c2462765e83
Assignee: mbanner → mozilla
Target Milestone: --- → mozilla18
https://hg.mozilla.org/mozilla-central/rev/6c2462765e83
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
It's #11 top crasher in TB 17.0.
Feel free to nominate a patch for ESR17 as desired, if this fix is low risk.
Comment on attachment 644665 [details] [diff] [review]
possible fix

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: eliminate Thunderbird topcrash
User impact if declined: significant numbers of users continue crashing
Fix Landed on Version: 18
Risk to taking this patch (and alternatives if risky): <standard8>
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Note: since the checkin there are no crashes for Thunderbird 18 beta and other development branches.  And past branch history suggests we should have seen some.  There are no regressions cited against the bug report. OTOH, we don't have any user testcases, nor crashing users who specifically tested. (attempting to contact some today)

I can't speak about firefox as to whether there are regressions or increase/decrease of crashes. But there are only 3 FF18 crashes in the past month. one example is bp-d9c346f2-ef72-4697-a1f5-40a2f2121222 TB18 beta 20121219074241
Attachment #644665 - Flags: approval-mozilla-esr17?
standard8, can you comment further as to risk?
There should be minimal risk here as this is effectively just adding a null check.
Comment on attachment 644665 [details] [diff] [review]
possible fix

Thanks - that's manageable risk if just a NULL check.
Attachment #644665 - Flags: approval-mozilla-esr17? → approval-mozilla-esr17+
You need to log in before you can comment on or make changes to this bug.