Windows Crash in [@ mozilla::a11y::RemoteAccessible::GetCOMInterface]
Categories
(Core :: Disability Access APIs, defect)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr91 | --- | unaffected |
| firefox-esr102 | 113+ | fixed |
| firefox96 | --- | wontfix |
| firefox97 | --- | wontfix |
| firefox98 | --- | wontfix |
| firefox105 | --- | wontfix |
| firefox106 | --- | wontfix |
| firefox107 | --- | wontfix |
| firefox110 | --- | wontfix |
| firefox111 | --- | wontfix |
| firefox112 | --- | wontfix |
| firefox113 | + | fixed |
| firefox114 | + | fixed |
People
(Reporter: aryx, Assigned: Jamie)
References
Details
(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [win:stability][adv-main113+r][adv-ESR102.11+r])
Crash Data
Attachments
(2 files)
|
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr102+
tjr
:
sec-approval+
|
Details | Review |
|
77.03 KB,
image/png
|
Details |
This Windows crash signature started with Firefox 94 when Fission got rolled out. 150-300 crashes per release cycle.
Maybe Fission related. (DOMFissionEnabled=1)
Crash report: https://crash-stats.mozilla.org/report/index/e07f79eb-5ab8-48fd-9868-17eaf0220125
Reason: EXCEPTION_ACCESS_VIOLATION_READ
Top 10 frames of crashing thread:
0 xul.dll mozilla::a11y::RemoteAccessible::GetCOMInterface const accessible/ipc/win/RemoteAccessible.cpp:51
1 xul.dll static mozilla::a11y::MsaaAccessible::NativeAccessible accessible/windows/msaa/MsaaAccessible.cpp:758
2 xul.dll mozilla::a11y::DocAccessibleParent::SendParentCOMProxy accessible/ipc/DocAccessibleParent.cpp:928
3 xul.dll mozilla::a11y::DocAccessibleParent::AddChildDoc accessible/ipc/DocAccessibleParent.cpp:681
4 xul.dll mozilla::a11y::DocAccessibleParent::AddChildDoc accessible/ipc/DocAccessibleParent.cpp:744
5 xul.dll mozilla::dom::BrowserParent::RecvPDocAccessibleConstructor dom/ipc/BrowserParent.cpp:1295
6 xul.dll mozilla::dom::PBrowserParent::OnMessageReceived ipc/ipdl/PBrowserParent.cpp:2813
7 xul.dll mozilla::dom::PContentParent::OnMessageReceived ipc/ipdl/PContentParent.cpp:6561
8 xul.dll mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:1968
9 xul.dll mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:771
Comment 1•3 years ago
|
||
A bunch of the crashes have clear UAF patterns
Updated•3 years ago
|
Updated•3 years ago
|
| Assignee | ||
Comment 2•3 years ago
|
||
WinDBG shows that we crash when we try to Release the pointer previously held in RemoteAccessible::mCOMProxy:
00 (Inline Function) --------`-------- xul!mozilla::RefPtrTraits<IAccessible>::Release(struct IAccessible * aPtr = 0xe5e5e5e5`e5e5e5e5) [/builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h @ 50]
01 (Inline Function) --------`-------- xul!RefPtr<IAccessible>::ConstRemovingRefPtrTraits<IAccessible>::Release(struct IAccessible * aPtr = 0xe5e5e5e5`e5e5e5e5) [/builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h @ 381]
02 (Inline Function) --------`-------- xul!RefPtr<IAccessible>::assign_assuming_AddRef(struct IAccessible * aNewPtr = 0x00000281`b731e478)+0xd [/builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h @ 69]
03 (Inline Function) --------`-------- xul!RefPtr<IAccessible>::operator=(struct already_AddRefed<IAccessible> * aRhs = <Value unavailable error>)+0x17 [/builds/worker/workspace/obj-build/dist/include/mozilla/RefPtr.h @ 206]
04 0000009e`6c7fca60 00007ff8`0d7aa354 xul!mozilla::a11y::RemoteAccessible::GetCOMInterface(void ** aOutAccessible = 0x0000009e`6c7fcb08)+0xc5 [/builds/worker/checkouts/gecko/accessible/ipc/win/RemoteAccessible.cpp @ 51]
This suggests that the RemoteAccessible has been freed! What I can't work out is how. The only time a RemoteAccessible is freed is when its ProxyEntry is destroyed, which only happens in DocAccessibleParent::RemoveAccessible. And that method also removes the RemoteAccessible from the mAccessibles hash table, which means that AddChildDoc shouldn't have been able to retrieve it.
I do find bp-90e0064e-2a67-4338-98fc-e12430220317 intriguing. In this crash, we see that AddChildDoc (triggered by a received IPDL message) is re-entering a client COM call. That is never supposed to happen, but apparently, we spin the event loop if the window is closed (via nsThreadManager::SpinEventLoopUntilInternal). I still can't fathom how that could result in this situation; the RemoteAccessible still shouldn't be in the map if it's destroyed. However, even though we can't see it in these stacks, it makes me wonder what would happen if we re-entered IPDL during a COM call to get a proxy triggered by SetParentCOMProxy. For example, could we destroy the OuterDoc we're trying to set a COM proxy for? I guess that would result in this situation.
I don't really know how we can avoid this except maybe for a re-entry guard or similar, but that would still cause obscure failures. And we'd probably need to guard a lot of places.
If this is indeed caused by re-entry, it will go away once we ship Cache the World (bug 1694563), since we don't do cross-process COM with that new architecture and thus can't be re-entered that way.
| Assignee | ||
Comment 3•3 years ago
|
||
As noted in comment 2, I can't reproduce this and I don't specifically know what's causing it. I only have a broad suspicion about what might be occurring, but not specific enough to realistically be actionable. Thus, I'm marking this as stalled for now, but I'll update if I learn anything new.
Comment 4•3 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 20 desktop browser crashes on beta
For more information, please visit auto_nag documentation.
| Reporter | ||
Updated•3 years ago
|
Comment 5•3 years ago
|
||
This is showing up in the early 105 topcrash list.
Comment 6•3 years ago
•
|
||
I just filed bug 1794338 which is a new signature for some of the crashes which appear here but it's not an UAF. In particular it seems like there are two problems under this signature:
- The oldest one is an UAF and it presents itself with the following stack:
0 xul.dll mozilla::a11y::RemoteAccessible::GetCOMInterface(void**) const accessible/ipc/win/RemoteAccessible.cpp:52 1 xul.dll mozilla::a11y::MsaaAccessible::NativeAccessible(mozilla::a11y::Accessible*) accessible/windows/msaa/MsaaAccessible.cpp:772 2 xul.dll mozilla::a11y::DocAccessibleParent::SendParentCOMProxy(mozilla::a11y::Accessible*) - The second one is a NULL-pointer access and the top frames of the stack are the following:
0 xul.dll mozilla::a11y::RemoteAccessible::GetCOMInterface(void**) const accessible/ipc/win/RemoteAccessible.cpp:50 1 xul.dll mozilla::a11y::MsaaAccessible::NativeAccessible(mozilla::a11y::Accessible*) accessible/windows/msaa/MsaaAccessible.cpp:772 2 xul.dll mozilla::a11y::MsaaAccessible::get_accFocus(tagVARIANT*) accessible/windows/msaa/MsaaAccessible.cpp:1277
Bug 1794338 is a signature change of the second problem caused by adding support for inlined frames. Over time the non-UAF crashes here should migrate under bug 1794338 and leave only the first problem under this bug. Note that the spike that's been seen here should be entirely made up of NULL-pointer crashes.
Comment 7•3 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
Updated•2 years ago
|
Comment 8•2 years ago
|
||
Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criterion:
- Top 20 desktop browser crashes on release
For more information, please visit auto_nag documentation.
| Assignee | ||
Comment 9•2 years ago
|
||
This should be fixed by bug 1751945. Also, anyone who has the new accessibility cache (which we're rolling out to release at present) shouldn't hit this.
| Assignee | ||
Updated•2 years ago
|
| Assignee | ||
Updated•2 years ago
|
Updated•2 years ago
|
Comment 10•2 years ago
|
||
We're still seeing these crashes in 112.0b7.
https://crash-stats.mozilla.org/report/index/095634aa-362b-404f-85fc-e723e0230327
https://crash-stats.mozilla.org/report/index/f20a7a06-dc55-4a11-be37-95d980230328
| Assignee | ||
Comment 11•2 years ago
|
||
Urgh. Thanks for flagging.
I'm less concerned about the GetExistingID crashes (like this one) because they don't appear to be a UAF. However, a null check is probably needed somewhere.
The GetCOMInterface crashes (like this one are more worrying, as they're definitely UAF. That suggests the WeakPtr isn't working for some reason.
Updated•2 years ago
|
| Assignee | ||
Comment 12•2 years ago
|
||
Comment 13•2 years ago
|
||
This appears to be a new signature for this crash.
Comment 14•2 years ago
|
||
We're still seeing these crashes in 112.0b7.
It's not quite the same though. The first hump of crashes that goes away at the end of November is the mozilla::a11y::MsaaAccessible::GetExistingID signature. There have been a small handful per day since so it's not entirely gone, but they're untroubling null derefs.
The mozilla::a11y::MsaaAccessible::NativeAccessible was all UAF, but happened at a small rate. It definitely "spiked" in Fx 111 with 52% of the crashes with that signature reported in the last 6 months -- but that's only 27 out of 52. That signature is completely GONE in 112 so we definitely fixed a UAF here.
The original mozilla::a11y::RemoteAccessible::GetCOMInterface signature is responsible for the huge hump on the right from the fx 111 release, but it's also the "floor level" prior to 111. In 108, 109, and 110 there were ~500 crashes per major version; in 111 there were 5300 -- 56% of the crashes with this signature in the past 6 months. There ARE a small number of these still happening in Fx 112 but the crashing point is not the same, as Gabriele noted in comment 13. The old builds crash inside the if (!mCOMProxy && mSafeToRecurse) { block, while 112 and 102.10 are crashing after it. older builds have GetCOMInterface called by NativeAccessible while newer builds have GetCOMInterface called directly by AddChildDoc.
I think it would be less confusing to re-close this bug and open a new bug with just the GetCOMInterface crash for version 112 and later.
| Assignee | ||
Comment 15•2 years ago
|
||
Probably less confusing, although there's now a patch for this other crash on this bug, so we'd need to move that. I'm not sure it's worth the churn, but happy either way.
| Assignee | ||
Comment 16•2 years ago
|
||
Comment on attachment 9325599 [details]
Bug 1751943: Fix doc tree mutation during AddChildDoc.
Security Approval Request
- How easily could an exploit be constructed based on the patch?: The addition of strong references and shutdown checks unfortunately hints very strongly at a UAF. However, I think this would probably be difficult to exploit given its rarity and the multi-process race conditions involved.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: all
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: I think the patch should apply cleanly. If not, it should be easy to construct a backport.
- How likely is this patch to cause regressions; how much testing does it need?: Unlikely; just adds additional guards.
- Is Android affected?: No
Comment 17•2 years ago
|
||
Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.
For more information, please visit auto_nag documentation.
Comment 18•2 years ago
|
||
Comment on attachment 9325599 [details]
Bug 1751943: Fix doc tree mutation during AddChildDoc.
Approved to land and request uplift
Comment 19•2 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/84a753fc07c2
I don't think we're going to see much in the way of stability data from this until we uplift, so go ahead and request approval whenever you're comfortable doing so.
| Assignee | ||
Comment 20•2 years ago
|
||
Comment on attachment 9325599 [details]
Bug 1751943: Fix doc tree mutation during AddChildDoc.
Beta/Release Uplift Approval Request
- User impact if declined: UAF, crash.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Just adds object lifecycle guards.
- String changes made/needed:
- Is Android affected?: No
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration:
- User impact if declined: UAF, crash.
- Fix Landed on Version: 114
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Just adds object lifecycle guards.
Comment 21•2 years ago
|
||
Comment on attachment 9325599 [details]
Bug 1751943: Fix doc tree mutation during AddChildDoc.
Approved for 113.0b7 and 102.11esr.
Comment 22•2 years ago
|
||
| uplift | ||
https://hg.mozilla.org/releases/mozilla-beta/rev/7e7b52caee78
https://hg.mozilla.org/releases/mozilla-esr102/rev/bdbecc33eb06
Updated•2 years ago
|
Updated•2 years ago
|
| Assignee | ||
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Description
•