Closed Bug 1502338 Opened 6 years ago Closed 2 years ago

Crash in nsAccessibilityService::GetRootDocumentAccessible

Categories

(Core :: Disability Access APIs, defect, P5)

60 Branch
All
Windows
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr60 - wontfix
firefox63 --- unaffected
firefox64 --- unaffected
firefox65 --- unaffected
firefox66 --- unaffected
firefox67 --- unaffected

People

(Reporter: philipp, Unassigned)

Details

(5 keywords)

Crash Data

This bug was filed from the Socorro interface and is report bp-b16f80ee-baa3-428a-a57a-6339d0181025. ============================================================= Top 10 frames of crashing thread: 0 xul.dll nsAccessibilityService::GetRootDocumentAccessible accessible/base/nsAccessibilityService.cpp:461 1 xul.dll nsBaseWidget::GetRootAccessible widget/nsBaseWidget.cpp:1945 2 xul.dll nsWindow::GetAccessible widget/windows/nsWindow.cpp:7493 3 xul.dll mozilla::a11y::LazyInstantiator::ResolveRootAccWrap accessible/windows/msaa/LazyInstantiator.cpp:284 4 xul.dll mozilla::a11y::LazyInstantiator::MaybeResolveRoot accessible/windows/msaa/LazyInstantiator.cpp:335 5 xul.dll mozilla::a11y::LazyInstantiator::QueryService accessible/windows/msaa/LazyInstantiator.cpp:756 6 oleacc.dll IsComboEx 7 oleacc.dll WrapObject 8 oleacc.dll AccessibleObjectFromWindow 9 oleacc.dll AccessibleObjectFromEvent ============================================================= this crash signature is most common prevalent among esr users. it is the #8 top browser crash, responsible for 1% of browser crashes in 60.3.0esr so far. the most common accessibility client in those reports seem to be various versions of JAWS.
Some investigation with WinDBG. Stack: 0:000> kp # ChildEBP RetAddr 00 (Inline) -------- xul!mozilla::detail::WeakReference<nsDocShell>::get [z:\build\build\src\obj-firefox\dist\include\mozilla\weakptr.h @ 168] 01 (Inline) -------- xul!mozilla::WeakPtr<nsDocShell>::operator class nsDocShell * [z:\build\build\src\obj-firefox\dist\include\mozilla\weakptr.h @ 276] 02 (Inline) -------- xul!nsIDocument::GetDocShell [z:\build\build\src\dom\base\nsdocument.cpp @ 12552] 03 0021ca44 5f5f450c xul!nsAccessibilityService::GetRootDocumentAccessible(class nsIPresShell * aPresShell = 0x18421000, bool aCanCreate = true)+0x17 [z:\build\build\src\accessible\base\nsaccessibilityservice.cpp @ 461] 04 0021ca5c 5f63c08e9 xul!nsBaseWidget::GetRootAccessible(void)+0x45 [z:\build\build\src\widget\nsbasewidget.cpp @ 1945] 05 0021ca70 5fc76bca xul!nsWindow::GetAccessible(void)+0x86 [z:\build\build\src\widget\windows\nswindow.cpp @ 7493] 06 0021ca88 5fc76d0d xul!mozilla::a11y::LazyInstantiator::MaybeResolveRoot(void)+0x9b [z:\build\build\src\accessible\windows\msaa\lazyinstantiator.cpp @ 335] 07 0021ca98 72c74afe xul!mozilla::a11y::LazyInstantiator::QueryService(struct _GUID * aServiceId = 0x72ca19f8 --- memory read error at address 0x72ca19f8 ---, struct _GUID * aServiceIid = 0x72c72438 {00000000-0000-0000-C000-000000000046}, void ** aOutInterface = 0x0021cac0)+0x21 [z:\build\build\src\accessible\windows\msaa\lazyinstantiator.cpp @ 756] 08 0021cab8 72c72b33 oleacc!AccWrap_Base::AlreadyWrapped+0x3f 09 0021cac8 72c72b02 oleacc!WrapObject+0xf 0a 0021caec 72c98c0b oleacc!AccessibleObjectFromWindow+0x59 0b 0021cb28 6d979946 oleacc!AccessibleObjectFromEvent+0x5c WARNING: Stack unwind information not available. Following frames may be wrong. 0c 0021cfc8 6d97caa1 jhook+0x29946 0d 0021d020 6d97f938 jhook+0x2caa1 0e 0021e7e0 75d530bc jhook+0x2f938 0f 0021e818 77b46aee user32!__ClientCallWinEventProc+0x2a 10 0021e84c 75d4f27d ntdll!KiUserCallbackDispatcher+0x2e 11 0021e850 5e2fe825 user32!NtUserShowWindow+0xc 12 0021e874 5e2fd77a xul!nsWindow::Show(bool bState = true)+0xe3 [z:\build\build\src\widget\windows\nswindow.cpp @ 1617] 13 0021e890 5e308639 xul!nsXULWindow::SetVisibility(bool aVisibility = true)+0x5b [z:\build\build\src\xpfe\appshell\nsxulwindow.cpp @ 839] 14 0021e8d4 5e30851c xul!nsXULWindow::OnChromeLoaded(void)+0x108 [z:\build\build\src\xpfe\appshell\nsxulwindow.cpp @ 1150] 15 0021e8e4 5df30e58 xul!nsWebShellWindow::OnStateChange(class nsIWebProgress * aProgress = 0x0dbb1c14, class nsIRequest * aRequest = 0x008ee5b0, unsigned int aStateFlags = <Value unavailable error>, nsresult aStatus = 0n-2141388767 (No matching enumerant))+0x52 [z:\build\build\src\xpfe\appshell\nswebshellwindow.cpp @ 662] ... The line in nsAccessibilityService::GetRootDocumentAccessible which triggers the crash is: 461 nsCOMPtr<nsIDocShellTreeItem> treeItem(documentNode->GetDocShell()); That suggests documentNode might be invalid: 459 nsIDocument* documentNode = aPresShell->GetDocument(); 0:000> .frame 3 03 0021ca44 5f5f450c xul!nsAccessibilityService::GetRootDocumentAccessible+0x17 [z:\build\build\src\accessible\base\nsaccessibilityservice.cpp @ 461] 0:000> dv this = 0x0c5510d0 aPresShell = 0x18421000 aCanCreate = true documentNode = 0xe5e5e5e5 ... Some research suggests 0xe5e5e5e5 is a magic value used to indicate freed memory. So I guess the PresShell was freed? How could that even happen while we're showing the window? I guess some reentry could be at play here, but this doesn't seem to be related to remote content in any way.
Keywords: sec-high
Aaron, I know you're probably not the right person to ask, but I'm hoping for some inspiration here. Or perhaps you know of someone else I can ask? 1. Does my analysis in comment 1 make sense; i.e. that the PresShell was destroyed? 2. Is this supposed to be possible during nsWindow::Show? I do see this comment in nsXULWindow::OnChromeLoaded: 1132 // At this point the window may have been closed already during Show() or 1133 // SyncAttributesToWidget(), so nsXULWindow::Destroy may already have been 1134 // called. Take care! I don't quite follow the relationship between XUL windows, Widgets and PresShell. Still, it seems like this might be connected. nsWindow::GetAccessible does check whether the nsWindow has already been destroyed, and obviously it hasn't (because it doesn't bale early). Appreciate your thoughts.
Flags: needinfo?(aklotz)
I think your analysis makes sense. Unfortunately my understanding of all of all things *Shell is pretty poor myself. I'm going to ni? mconley to see if he has any thoughts on where to look.
Flags: needinfo?(aklotz) → needinfo?(mconley)
Hm. Thanks for reporting, Jamie. Your reasoning seems sound to me. Do we know how easy is it to reproduce this? Was this from bringing in the stack dump, or have we been able to capture this one in a "lab" setting? If we have, do we know if mozilla::PresShell::Destroy is somehow being called on the aPresShell before this occurs? If so, what's the callstack? I'm not actually aware of any potential for re-entry here... maybe mats some ideas here?
Flags: needinfo?(mconley) → needinfo?(mats)
Group: core-security → layout-core-security
(In reply to Mike Conley (:mconley) (:⚙️) from comment #4) > Do we know how easy is it to reproduce this? Was this from bringing in the > stack dump, or have we been able to capture this one in a "lab" setting? Only the stack dump, I'm afraid. We've not been able to reproduce it.
Crash data for the past 6 months shows: Firefox 60.2.2esr 560 37.7% 382 Firefox 60.2.1esr 541 36.4% 395 Firefox 60.3.0esr 259 17.4% 270 Firefox 60.2.0esr 69 4.6% 58 Firefox 60.0.1 10 0.7% 9 ... It seems odd that almost all reports comes from ESR builds.
Crash Signature: [@ nsAccessibilityService::GetRootDocumentAccessible] → [@ nsAccessibilityService::GetRootDocumentAccessible] [@ nsCOMPtr<T>::nsCOMPtr<T> | nsAccessibilityService::GetRootDocumentAccessible ]
I can think of two possible reasons for that: 1. We support old versions of the JAWS screen reader in ESR 60 by disabling e10s. (We did this because those users shouldn't have been auto updated from ESR 52 in the first place, but they were; see bug 1489605.) Given that non-e10s is no longer a "supported" configuration, this could be a code path that is now broken somehow. That said, I don't see anything in the stack that should be specific to non-e10s. 2. This might happen only with old versions of JAWS. (Since they can only run with ESR 60, we would not see this with newer versions of Firefox.) This might be due to a JAWS bug, but it could just as well be due to a Firefox bug that happens to only be triggered by JAWS.
It does look like these are all related to JAWS: (99.08% in signature vs 00.59% overall) Module "jhook.dll" = true I can't see anything in the stack that makes me think JAWS is doing anything wrong - it's just calling AccessibleObjectFromEvent in response to a win event, which is perfectly normal - but I can't rule it out.
I think various win32 entry points, e.g. ShowWindow, may flush pending WM messages which could cause the window or arbitrary other layout objects to be destroyed. I'm guessing this can happen before the hook for a11y::LazyInstantiator is run. I don't think it's a WM_DESTROY though, because then GetNSWindowPtr(aHwnd) in WinUtils::GetRootAccessibleForHWND should have returned null and it seems we got past that since we called window->GetAccessible(). If the PresShell was destroyed we should always get null here: https://searchfox.org/mozilla-central/rev/7c848ac7630df5baf1314b0c03e015683599efb9/widget/nsBaseWidget.cpp#2009 If mWidgetListener is a nsView, it just returns its ViewManager's pointer: https://searchfox.org/mozilla-central/rev/7c848ac7630df5baf1314b0c03e015683599efb9/view/nsView.cpp#990 which we explicitly null out in PresShell::Destroy: https://searchfox.org/mozilla-central/rev/7c848ac7630df5baf1314b0c03e015683599efb9/layout/base/PresShell.cpp#1313 If it's a nsWebShellWindow::WidgetListenerDelegate: https://searchfox.org/mozilla-central/rev/7c848ac7630df5baf1314b0c03e015683599efb9/xpfe/appshell/nsWebShellWindow.cpp#783 it should have been nulled out here: https://searchfox.org/mozilla-central/rev/7c848ac7630df5baf1314b0c03e015683599efb9/xpfe/appshell/nsXULWindow.cpp#545 nsWebBrowser::WidgetListenerDelegate doesn't implement GetPresShell() so it always returns null: https://searchfox.org/mozilla-central/rev/7c848ac7630df5baf1314b0c03e015683599efb9/toolkit/components/browser/nsWebBrowser.h#74 So I don't really see how we got hold of a destroyed PresShell.
Flags: needinfo?(mats)

This only seems to affect ESR60. Also seems to be stalled?

I don't really have any further ideas here. I understand we do see quite a few of these crashes. However, given that we can't reproduce this ourselves, this only occurs in ESR 60 and it is very likely to be related to older versions of JAWS we won't be supporting past ESR 60, I'm not sure it's worthwhile to invest a large effort in spinning our wheels on this one.

Flags: needinfo?(jteh)
Keywords: stalled
Priority: -- → P5

Removing employee no longer with company from CC list of private bugs.

Severity: critical → S2

I don't see any crashes with these signatures recently. Closing.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit BugBot documentation.

Keywords: stalled
Group: layout-core-security
You need to log in before you can comment on or make changes to this bug.