Closed Bug 1751943 Opened 3 years ago Closed 2 years ago

Windows Crash in [@ mozilla::a11y::RemoteAccessible::GetCOMInterface]

Categories

(Core :: Disability Access APIs, defect)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
114 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 113+ fixed
firefox96 --- wontfix
firefox97 --- wontfix
firefox98 --- wontfix
firefox105 --- wontfix
firefox106 --- wontfix
firefox107 --- wontfix
firefox110 --- wontfix
firefox111 --- wontfix
firefox112 --- wontfix
firefox113 + fixed
firefox114 + fixed

People

(Reporter: aryx, Assigned: Jamie)

References

Details

(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [win:stability][adv-main113+r][adv-ESR102.11+r])

Crash Data

Attachments

(2 files)

This Windows crash signature started with Firefox 94 when Fission got rolled out. 150-300 crashes per release cycle.

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/e07f79eb-5ab8-48fd-9868-17eaf0220125

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames of crashing thread:

0 xul.dll mozilla::a11y::RemoteAccessible::GetCOMInterface const accessible/ipc/win/RemoteAccessible.cpp:51
1 xul.dll static mozilla::a11y::MsaaAccessible::NativeAccessible accessible/windows/msaa/MsaaAccessible.cpp:758
2 xul.dll mozilla::a11y::DocAccessibleParent::SendParentCOMProxy accessible/ipc/DocAccessibleParent.cpp:928
3 xul.dll mozilla::a11y::DocAccessibleParent::AddChildDoc accessible/ipc/DocAccessibleParent.cpp:681
4 xul.dll mozilla::a11y::DocAccessibleParent::AddChildDoc accessible/ipc/DocAccessibleParent.cpp:744
5 xul.dll mozilla::dom::BrowserParent::RecvPDocAccessibleConstructor dom/ipc/BrowserParent.cpp:1295
6 xul.dll mozilla::dom::PBrowserParent::OnMessageReceived ipc/ipdl/PBrowserParent.cpp:2813
7 xul.dll mozilla::dom::PContentParent::OnMessageReceived ipc/ipdl/PContentParent.cpp:6561
8 xul.dll mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:1968
9 xul.dll mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:771

A bunch of the crashes have clear UAF patterns

Group: core-security
Group: core-security → dom-core-security
Group: dom-core-security → layout-core-security

WinDBG shows that we crash when we try to Release the pointer previously held in RemoteAccessible::mCOMProxy:

00 (Inline Function) --------`--------     xul!mozilla::RefPtrTraits<IAccessible>::Release(struct IAccessible * aPtr = 0xe5e5e5e5`e5e5e5e5) [/builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h @ 50] 
01 (Inline Function) --------`--------     xul!RefPtr<IAccessible>::ConstRemovingRefPtrTraits<IAccessible>::Release(struct IAccessible * aPtr = 0xe5e5e5e5`e5e5e5e5) [/builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h @ 381] 
02 (Inline Function) --------`--------     xul!RefPtr<IAccessible>::assign_assuming_AddRef(struct IAccessible * aNewPtr = 0x00000281`b731e478)+0xd [/builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h @ 69] 
03 (Inline Function) --------`--------     xul!RefPtr<IAccessible>::operator=(struct already_AddRefed<IAccessible> * aRhs = <Value unavailable error>)+0x17 [/builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h @ 206] 
04 0000009e`6c7fca60 00007ff8`0d7aa354     xul!mozilla::a11y::RemoteAccessible::GetCOMInterface(void ** aOutAccessible = 0x0000009e`6c7fcb08)+0xc5 [/builds/worker/checkouts/gecko/accessible/ipc/win/RemoteAccessible.cpp @ 51] 

This suggests that the RemoteAccessible has been freed! What I can't work out is how. The only time a RemoteAccessible is freed is when its ProxyEntry is destroyed, which only happens in DocAccessibleParent::RemoveAccessible. And that method also removes the RemoteAccessible from the mAccessibles hash table, which means that AddChildDoc shouldn't have been able to retrieve it.

I do find bp-90e0064e-2a67-4338-98fc-e12430220317 intriguing. In this crash, we see that AddChildDoc (triggered by a received IPDL message) is re-entering a client COM call. That is never supposed to happen, but apparently, we spin the event loop if the window is closed (via nsThreadManager::SpinEventLoopUntilInternal). I still can't fathom how that could result in this situation; the RemoteAccessible still shouldn't be in the map if it's destroyed. However, even though we can't see it in these stacks, it makes me wonder what would happen if we re-entered IPDL during a COM call to get a proxy triggered by SetParentCOMProxy. For example, could we destroy the OuterDoc we're trying to set a COM proxy for? I guess that would result in this situation.

I don't really know how we can avoid this except maybe for a re-entry guard or similar, but that would still cause obscure failures. And we'd probably need to guard a lot of places.

If this is indeed caused by re-entry, it will go away once we ship Cache the World (bug 1694563), since we don't do cross-process COM with that new architecture and thus can't be re-entered that way.

Depends on: 1737193

As noted in comment 2, I can't reproduce this and I don't specifically know what's causing it. I only have a broad suspicion about what might be occurring, but not specific enough to realistically be actionable. Thus, I'm marking this as stalled for now, but I'll update if I learn anything new.

Assignee: nobody → jteh
Keywords: stalled

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 20 desktop browser crashes on beta

For more information, please visit auto_nag documentation.

Keywords: topcrash
Crash Signature: [@ mozilla::a11y::RemoteAccessible::GetCOMInterface] → [@ mozilla::a11y::MsaaAccessible::GetExistingID] [@ mozilla::a11y::RemoteAccessible::GetCOMInterface]

This is showing up in the early 105 topcrash list.

I just filed bug 1794338 which is a new signature for some of the crashes which appear here but it's not an UAF. In particular it seems like there are two problems under this signature:

  • The oldest one is an UAF and it presents itself with the following stack:
    0 	xul.dll 	mozilla::a11y::RemoteAccessible::GetCOMInterface(void**) const 	accessible/ipc/win/RemoteAccessible.cpp:52
    1 	xul.dll 	mozilla::a11y::MsaaAccessible::NativeAccessible(mozilla::a11y::Accessible*) 	accessible/windows/msaa/MsaaAccessible.cpp:772
    2 	xul.dll 	mozilla::a11y::DocAccessibleParent::SendParentCOMProxy(mozilla::a11y::Accessible*)
    
  • The second one is a NULL-pointer access and the top frames of the stack are the following:
    0 	xul.dll 	mozilla::a11y::RemoteAccessible::GetCOMInterface(void**) const 	accessible/ipc/win/RemoteAccessible.cpp:50
    1 	xul.dll 	mozilla::a11y::MsaaAccessible::NativeAccessible(mozilla::a11y::Accessible*) 	accessible/windows/msaa/MsaaAccessible.cpp:772
    2 	xul.dll 	mozilla::a11y::MsaaAccessible::get_accFocus(tagVARIANT*) 	accessible/windows/msaa/MsaaAccessible.cpp:1277
    

Bug 1794338 is a signature change of the second problem caused by adding support for inlined frames. Over time the non-UAF crashes here should migrate under bug 1794338 and leave only the first problem under this bug. Note that the spike that's been seen here should be entirely made up of NULL-pointer crashes.

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash
See Also: → 1751945
Whiteboard: [win:stability]

Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criterion:

  • Top 20 desktop browser crashes on release

For more information, please visit auto_nag documentation.

Keywords: topcrash

This should be fixed by bug 1751945. Also, anyone who has the new accessibility cache (which we're rolling out to release at present) shouldn't hit this.

Depends on: 1751945
Keywords: stalled
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Group: layout-core-security → core-security-release
Whiteboard: [win:stability] → [fixed by bug 1751945][win:stability]

Urgh. Thanks for flagging.

I'm less concerned about the GetExistingID crashes (like this one) because they don't appear to be a UAF. However, a null check is probably needed somewhere.

The GetCOMInterface crashes (like this one are more worrying, as they're definitely UAF. That suggests the WeakPtr isn't working for some reason.

Flags: needinfo?(jteh)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

This appears to be a new signature for this crash.

Crash Signature: [@ mozilla::a11y::MsaaAccessible::GetExistingID] [@ mozilla::a11y::RemoteAccessible::GetCOMInterface] → [@ mozilla::a11y::MsaaAccessible::GetExistingID] [@ mozilla::a11y::RemoteAccessible::GetCOMInterface] [@ mozilla::a11y::MsaaAccessible::NativeAccessible]

We're still seeing these crashes in 112.0b7.

It's not quite the same though. The first hump of crashes that goes away at the end of November is the mozilla::a11y::MsaaAccessible::GetExistingID signature. There have been a small handful per day since so it's not entirely gone, but they're untroubling null derefs.

The mozilla::a11y::MsaaAccessible::NativeAccessible was all UAF, but happened at a small rate. It definitely "spiked" in Fx 111 with 52% of the crashes with that signature reported in the last 6 months -- but that's only 27 out of 52. That signature is completely GONE in 112 so we definitely fixed a UAF here.

The original mozilla::a11y::RemoteAccessible::GetCOMInterface signature is responsible for the huge hump on the right from the fx 111 release, but it's also the "floor level" prior to 111. In 108, 109, and 110 there were ~500 crashes per major version; in 111 there were 5300 -- 56% of the crashes with this signature in the past 6 months. There ARE a small number of these still happening in Fx 112 but the crashing point is not the same, as Gabriele noted in comment 13. The old builds crash inside the if (!mCOMProxy && mSafeToRecurse) { block, while 112 and 102.10 are crashing after it. older builds have GetCOMInterface called by NativeAccessible while newer builds have GetCOMInterface called directly by AddChildDoc.

I think it would be less confusing to re-close this bug and open a new bug with just the GetCOMInterface crash for version 112 and later.

Probably less confusing, although there's now a patch for this other crash on this bug, so we'd need to move that. I'm not sure it's worth the churn, but happy either way.

Comment on attachment 9325599 [details]
Bug 1751943: Fix doc tree mutation during AddChildDoc.

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: The addition of strong references and shutdown checks unfortunately hints very strongly at a UAF. However, I think this would probably be difficult to exploit given its rarity and the multi-process race conditions involved.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: all
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: I think the patch should apply cleanly. If not, it should be easy to construct a backport.
  • How likely is this patch to cause regressions; how much testing does it need?: Unlikely; just adds additional guards.
  • Is Android affected?: No
Attachment #9325599 - Flags: sec-approval?

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

Comment on attachment 9325599 [details]
Bug 1751943: Fix doc tree mutation during AddChildDoc.

Approved to land and request uplift

Attachment #9325599 - Flags: sec-approval? → sec-approval+

https://hg.mozilla.org/mozilla-central/rev/84a753fc07c2

I don't think we're going to see much in the way of stability data from this until we uplift, so go ahead and request approval whenever you're comfortable doing so.

Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Flags: needinfo?(jteh)
Resolution: --- → FIXED
Target Milestone: --- → 114 Branch
See Also: → 1829249

Comment on attachment 9325599 [details]
Bug 1751943: Fix doc tree mutation during AddChildDoc.

Beta/Release Uplift Approval Request

  • User impact if declined: UAF, crash.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Just adds object lifecycle guards.
  • String changes made/needed:
  • Is Android affected?: No

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration:
  • User impact if declined: UAF, crash.
  • Fix Landed on Version: 114
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Just adds object lifecycle guards.
Flags: needinfo?(jteh)
Attachment #9325599 - Flags: approval-mozilla-esr102?
Attachment #9325599 - Flags: approval-mozilla-beta?

Comment on attachment 9325599 [details]
Bug 1751943: Fix doc tree mutation during AddChildDoc.

Approved for 113.0b7 and 102.11esr.

Attachment #9325599 - Flags: approval-mozilla-esr102?
Attachment #9325599 - Flags: approval-mozilla-esr102+
Attachment #9325599 - Flags: approval-mozilla-beta?
Attachment #9325599 - Flags: approval-mozilla-beta+
Flags: qe-verify-
Whiteboard: [fixed by bug 1751945][win:stability] → [fixed by bug 1751945][win:stability][adv-main113+r]
Whiteboard: [fixed by bug 1751945][win:stability][adv-main113+r] → [win:stability][adv-main113+r]
Whiteboard: [win:stability][adv-main113+r] → [win:stability][adv-main113+r][adv-ESR102.11+r]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: