Closed Bug 679626 Opened 14 years ago Closed 12 years ago

crash in nsIDOMWindowInternal::GetDocShell with signature pointing to nsASDOMWindowEnumerator::GetNext. ref-counting issue?

Categories

(Core :: Widget, defect)

x86
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla18
Tracking Status
firefox-esr17 + fixed

People

(Reporter: Usul, Assigned: Bienvenu)

References

()

Details

(Keywords: crash, regression, topcrash+, Whiteboard: [tbird topcrash][gs][v6 regression])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is report bp-a302892d-e96d-46ac-943d-5f7fb2110813 . ============================================================= 0 xul.dll nsASDOMWindowEnumerator::GetNext xpfe/appshell/src/nsAppShellWindowEnumerator.cpp:261 1 xul.dll MarkWindowList content/base/src/nsCCUncollectableMarker.cpp:177 2 xul.dll nsCCUncollectableMarker::Observe content/base/src/nsCCUncollectableMarker.cpp:224 3 xul.dll nsObserverList::NotifyObservers xpcom/ds/nsObserverList.cpp:130 4 xul.dll nsObserverService::NotifyObservers xpcom/ds/nsObserverService.cpp:182 5 xul.dll nsCycleCollector::PrepareForCollection xpcom/base/nsCycleCollector.cpp:2524 6 xul.dll nsCycleCollectorRunner::Collect xpcom/base/nsCycleCollector.cpp:3371 7 xul.dll nsCycleCollector_collect xpcom/base/nsCycleCollector.cpp:3451 8 xul.dll nsJSContext::CycleCollectNow dom/base/nsJSEnvironment.cpp:3277 9 xul.dll CCTimerFired dom/base/nsJSEnvironment.cpp:3317 10 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:424 11 xul.dll nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:520 12 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:618 13 xul.dll NS_ProcessNextEvent_P objdir-tb/mozilla/xpcom/build/nsThreadUtils.cpp:245 14 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:134 15 xul.dll MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:218 16 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:202 17 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:176 18 xul.dll nsBaseAppShell::Run widget/src/xpwidgets/nsBaseAppShell.cpp:189 19 xul.dll nsAppShell::Run widget/src/windows/nsAppShell.cpp:249 20 xul.dll nsAppStartup::Run toolkit/components/startup/nsAppStartup.cpp:222 21 xul.dll XRE_main toolkit/xre/nsAppRunner.cpp:3686 22 thunderbird.exe NS_internal_main mail/app/nsMailApp.cpp:104 23 thunderbird.exe wmain toolkit/xre/nsWindowsWMain.cpp:106 24 thunderbird.exe __tmainCRTStartup objdir-tb/mozilla/memory/jemalloc/crtsrc/crtexe.c:591 25 kernel32.dll BaseThreadInitThunk 26 ntdll.dll __RtlUserThreadStart 27 ntdll.dll _RtlUserThreadStart
Product: Thunderbird → Core
QA Contact: general → general
This has been around since at least version 3.0 EXCEPTION_ACCESS_VIOLATION_READ 0x0 about 1/3 of Thunderbird and Firefox crashes are startup. example Firefox 7.0a bp-61cfaa98-af66-410a-aca5-ed65d2110801 example rare Mac crash bp-f1e7b34d-ad2b-4801-a4c2-52dca2110106
Crash Signature: [@ nsASDOMWindowEnumerator::GetNext(nsISupports**)] → [@ nsASDOMWindowEnumerator::GetNext(nsISupports**)] [@ @0x0 | nsASDOMWindowEnumerator::GetNext(nsISupports**)]
OS: Mac OS X → All
Whiteboard: [tbird crash]
xref Bug 616506 - crash [@ nsAppShell::ProcessNextNativeEvent ] on ARM Bug 574727 - Crash in [@ nsAppShell::ProcessNextNativeEvent ] (linux) Bug 524971 - Crash Report [@ nsAppShell::ProcessNextNativeEvent(int)]
Component: General → Widget
QA Contact: general → general
#3 crash for thunderbird version 6. And, it would seem, something regressive occurred for Thunderbird, because it went from obscurity in v5 to topcrash in v6 The exact opposite is true for Firefox - this crash almost doesn't exist in FF5 and 6
Keywords: regression
Whiteboard: [tbird crash] → [tbird topcrash]
Whiteboard: [tbird topcrash] → [tbird topcrash][gs]
ludo, bienvenu, did we have a theory as to what this might be? I thought we had an idea, but there's nothing documented here. The uptick in crash rate might be in the time frame of 20110714224222 builds [1]. Is it just coincidence that Bug 681632 is also has a large uptick in version 6? https://crash-stats.mozilla.com/report/list?product=Thunderbird&query_search=signature&query_type=exact&query=nsASDOMWindowEnumerator%3A%3AGetNext%28nsISupports**%29&reason_type=contains&date=8%2F14%2F2011%2010%3A43%3A39&range_value=16&range_unit=weeks&hang_type=any&process_type=any&do_query=1&admin=1&signature=nsASDOMWindowEnumerator%3A%3AGetNext%28nsISupports**%29
Could be the cycle collector behavior changed...though the core issue probably is some sort of ref-counting issue.
or they could be the same issue - I suspect that's the case.
#1 cash in Thunderbird v9
Keywords: topcrash
Bienvenu, To diagnose this do we need a special build to handle what Ted describes at bug 681632 comment 6? bug 681632 is almost certainly related - lots of reporters see both crashes. Unfortunately most crashes happen unattended by the user, or they can't reproduce (crash goes away after a couple). But still #1 crash for recent versions.
Blocks: 681632
(In reply to Wayne Mery (:wsmwk) from comment #9) > Bienvenu, > > To diagnose this do we need a special build to handle what Ted describes at > bug 681632 comment 6? No, as Ted says, from the code, we can tell it's crashing in nsIDOMWindowInternal::GetDocShell. We really need some way of reproducing this.
(In reply to David :Bienvenu from comment #10) > (In reply to Wayne Mery (:wsmwk) from comment #9) > > Bienvenu, > > > > To diagnose this do we need a special build to handle what Ted describes at > > bug 681632 comment 6? > No, as Ted says, from the code, we can tell it's crashing in > nsIDOMWindowInternal::GetDocShell. My skimming missed this - Thanks for the reminder. > We really need some way of reproducing this. still searching for a reporter :(
Blocks: 694646
Summary: crash nsASDOMWindowEnumerator::GetNext → crash in nsIDOMWindowInternal::GetDocShell with signature pointing to nsASDOMWindowEnumerator::GetNext
Whiteboard: [tbird topcrash][gs] → [tbird topcrash][gs][v6 regression]
Blocks: 681921
FWIW, there are no thunderbird or firefox crashes with nsIDOMWindowInternal::GetDocShell as top frame.
Component: Widget → General
Keywords: testcase-wanted
Product: Core → Thunderbird
QA Contact: general → general
Summary: crash in nsIDOMWindowInternal::GetDocShell with signature pointing to nsASDOMWindowEnumerator::GetNext → crash in nsIDOMWindowInternal::GetDocShell with signature pointing to nsASDOMWindowEnumerator::GetNext. ref-counting issue?
recap comment from bienvenu, in a conversation about bp-f203b130-a9be-4c8c-9171-d08592120424 "I'm not convinced this is pop3 related, from the log and stack traces, tbh. The pop3 protocol object is destroyed, the sockets are closed, and the crash occurs way downstream."
bienvenu, if (perhaps big "if") we find someone who can reliably reproduce, do you think there is a fair chance of getting a regression range?
(In reply to Wayne Mery (:wsmwk) from comment #14) > bienvenu, if (perhaps big "if") we find someone who can reliably reproduce, > do you think there is a fair chance of getting a regression range? Not really, no, and even a regression range might not be helpful. But if we do find someone who can reproduce it reliably, we can certainly drill down on potential causes.
what I would propose then, is that you compose a short message to "reporters of this crash" to ask for badly needed volunteers, outlining what we hope to accomplish and in broad terms what might be asked of them. And then have ludo email it via socorro.
bp-845180fe-c34e-4acd-ad28-ae8a52120120 wrote crashing stopped after disabling lightning. (OTOH many (most?) don't have lightning installed.)
Keywords: topcrashtopcrash+
Attached patch possible fixSplinter Review
This patch attempts to ignore windows where we couldn't get a doc shell when enumerating the windows. I don't know if this is the right thing to do, but it seems better than crashing...
Assignee: nobody → mozilla
Attachment #644665 - Flags: review?(jonas)
Comment on attachment 644665 [details] [diff] [review] possible fix Sorry, I don't really know this code well enough to review. I'm also not a docshell peer. Maybe try bsmedberg?
Attachment #644665 - Flags: review?(jonas) → review?(benjamin)
Comment on attachment 644665 [details] [diff] [review] possible fix So the basic notion here is that GetDOMWindow(mCurrentPosition->mWindow is returning a null window? GetDOMWindow returns an nsresult but we never check it (seems to me it could just return null). It's not clear to me whether it's safe to just skip the error cases or whether we really should be figuring out why the docshell getinterface call is failing.
Attachment #644665 - Flags: review?(benjamin) → review?(bzbarsky)
Comment on attachment 644665 [details] [diff] [review] possible fix The simplest reason the getInterface call would fail is that docShell is null, I would think. Which could just happen if a window is closed, right? So this looks reasonable to me.
Attachment #644665 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky (:bz) from comment #21) > The simplest reason the getInterface call would fail is that docShell is > null, I would think. Which could just happen if a window is closed, right? The best I can understand it, I think that could be the case. I'll land this as David isn't as active at the moment.
Moving to a more appropriate core component, although I'm not quite sure which one that is...
Assignee: mozilla → mbanner
Component: General → Widget
Product: Thunderbird → Core
Assignee: mbanner → mozilla
Target Milestone: --- → mozilla18
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
It's #11 top crasher in TB 17.0.
Feel free to nominate a patch for ESR17 as desired, if this fix is low risk.
Comment on attachment 644665 [details] [diff] [review] possible fix [Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: eliminate Thunderbird topcrash User impact if declined: significant numbers of users continue crashing Fix Landed on Version: 18 Risk to taking this patch (and alternatives if risky): <standard8> String or UUID changes made by this patch: none See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info. Note: since the checkin there are no crashes for Thunderbird 18 beta and other development branches. And past branch history suggests we should have seen some. There are no regressions cited against the bug report. OTOH, we don't have any user testcases, nor crashing users who specifically tested. (attempting to contact some today) I can't speak about firefox as to whether there are regressions or increase/decrease of crashes. But there are only 3 FF18 crashes in the past month. one example is bp-d9c346f2-ef72-4697-a1f5-40a2f2121222 TB18 beta 20121219074241
Attachment #644665 - Flags: approval-mozilla-esr17?
standard8, can you comment further as to risk?
There should be minimal risk here as this is effectively just adding a null check.
Comment on attachment 644665 [details] [diff] [review] possible fix Thanks - that's manageable risk if just a NULL check.
Attachment #644665 - Flags: approval-mozilla-esr17? → approval-mozilla-esr17+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: