Closed Bug 1749274 Opened 2 years ago Closed 2 years ago

Crash in [@ je_malloc | moz_xmalloc | xul.dll | <unknown in ntdll.dll> | RtlpCallVectoredHandlers | mozilla::a11y::MsaaAccessible::get_accRole]

Categories

(Core :: Disability Access APIs, defect)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
98 Branch
Tracking Status
firefox-esr91 --- disabled
firefox96 --- wontfix
firefox97 + fixed
firefox98 + fixed

People

(Reporter: gsvelto, Assigned: Jamie)

References

Details

(Keywords: crash, sec-moderate, Whiteboard: [post-critsmash-triage][adv-main97+r])

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/1436dae1-8ef8-4d39-8a6f-4f2800220110

Reason: EXCEPTION_STACK_BUFFER_OVERRUN / FAST_FAIL_GUARD_ICALL_CHECK_FAILURE

Top 10 frames of crashing thread:

0 ntdll.dll LdrpICallHandler 
1 ntdll.dll RtlpExecuteHandlerForException 
2 ntdll.dll RtlDispatchException 
3 None @0x000001220675b38f 
4 mozglue.dll je_malloc memory/build/malloc_decls.h:51
5 mozglue.dll moz_xmalloc memory/mozalloc/mozalloc.cpp:58
6 None @0x0000ae59f2cadc46 
7 xul.dll xul.dll@0x00000000014757cf 
8 ntdll.dll <unknown in ntdll.dll> 
9 ntdll.dll RtlpCallVectoredHandlers 

I'm restricting this bug because all the crashes under these signatures appear to be caused by some sort of stack corruption. There are several crash signatures and the stacks appear all over the place most likely because of the nature of the corruption. The stack walker can only infer so much in the first few frames before converging on mozilla::a11y::MsaaAccessible::get_accRole().

Group: core-security → dom-core-security

Curious about when the crash volume increased. This shows up in Release builds, at low-volume mid-cycle in November, and then jumps to higher volume mid-cycle December. Did some other software (like, accessibility tools?) ship updates in those timeframes?

Flags: needinfo?(jteh)

There were updates shipped for both leading Windows screen readers in December. That said, there's no way of knowing whether this is related. Because this is a content process crash, we don't have any data about the accessibility instantiator.

The fact that it's a content process crash is itself really odd. If this were due to some a11y client injecting buggy code, they can only inject into the parent process, so how are we crashing in content? Curious.

Flags: needinfo?(jteh)

WinDBG was able to unwind this stack better.

0:000> kp
 # Child-SP          RetAddr           Call Site
00 00000001`00bfcef8 00007ff8`664520cf ntdll!LdrpICallHandler+0xf
01 00000001`00bfcf00 00007ff8`66401454 ntdll!RtlpExecuteHandlerForException+0xf
02 00000001`00bfcf30 00007ff8`66450bfe ntdll!RtlDispatchException+0x244
03 00000001`00bfd640 00007ff8`6643c62e ntdll!KiUserExceptionDispatch+0x2e
*** WARNING: Unable to verify checksum for xul.dll
04 00000001`00bfddc8 00007fff`fd2a6070 ntdll!LdrpDispatchUserCallTarget+0xe
05 00000001`00bfddd0 00007ff8`65ae4dbe xul!mozilla::a11y::MsaaAccessible::get_accRole(struct tagVARIANT * varChild = <Memory access error>, struct tagVARIANT * pvarRole = 0x00000001`03ffe3e0 Empty)+0xb0 [/builds/worker/checkouts/gecko/accessible/windows/msaa/MsaaAccessible.cpp @ 1060] 
06 00000001`00bfdfb0 00007ff8`65ab9eea ole32!Invoke(void)+0x6e [com\ole32\com\txf\callframe\amd64\stubless.asm @ 290] 
07 00000001`00bfe010 00007fff`fb731f51 ole32!CallFrame::Invoke(void * pvReceiver = 0x00000122`0e92ae70)+0x8a [com\ole32\com\txf\callframe\callframe.cpp @ 891] 
08 00000001`00bfe090 00007fff`fb735ca4 xul!`anonymous namespace'::HandoffRunnable::Run(void)+0x51 [/builds/worker/checkouts/gecko/ipc/mscom/MainThreadHandoff.cpp @ 131] 
09 (Inline Function) --------`-------- xul!`anonymous namespace'::SyncRunnable::APCRun+0x28 [/builds/worker/checkouts/gecko/ipc/mscom/MainThreadInvoker.cpp @ 60] 
0a 00000001`00bfe0c0 00007ff8`6642f178 xul!mozilla::mscom::MainThreadInvoker::MainThreadAPC(unsigned int64 aParam = 0x00000122`0e6fde40)+0x74 [/builds/worker/checkouts/gecko/ipc/mscom/MainThreadInvoker.cpp @ 172] 
0b 00000001`00bfe150 00007ff8`66450a9e ntdll!RtlDispatchAPC+0x68
0c 00000001`00bfe1d0 00007ff8`66450564 ntdll!KiUserApcDispatch+0x2e
0d 00000001`00bfe708 00007fff`fb73a0a7 ntdll!NtTestAlert+0x14
0e 00000001`00bfe710 00007fff`f9972fde xul!`anonymous namespace'::SyncRunnable::Run(void)+0x17 [/builds/worker/checkouts/gecko/ipc/mscom/MainThreadInvoker.cpp @ 53] 
0f 00000001`00bfe740 00007fff`faaf642a xul!mozilla::SchedulerGroup::Runnable::Run(void)+0x3e [/builds/worker/checkouts/gecko/xpcom/threads/SchedulerGroup.cpp @ 144] 
10 (Inline Function) --------`-------- xul!mozilla::RunnableTask::Run+0x654 [/builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp @ 468] 
11 00000001`00bfe780 00007fff`faaf4f43 xul!mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(class mozilla::detail::BaseAutoLock<mozilla::Mutex &> * aProofOfLock = <Value unavailable error>)+0xeca [/builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp @ 771] 
12 (Inline Function) --------`-------- xul!mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal+0xb [/builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp @ 607] 
13 00000001`00bfeb60 00007fff`fa7b135b xul!mozilla::TaskController::ProcessPendingMTTask(bool aMayWait = <Value unavailable error>)+0x63 [/builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp @ 391] 
14 (Inline Function) --------`-------- xul!mozilla::TaskController::InitializeInternal::<lambda_2>::operator()+0xc [/builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp @ 127] 
15 (Inline Function) --------`-------- xul!mozilla::detail::RunnableFunction<`lambda at /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:127:7'>::Run+0xc [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.h @ 531] 
16 00000001`00bfec30 00007fff`fab1e583 xul!nsThread::ProcessNextEvent(bool aMayWait = <Value unavailable error>, bool * aResult = 0x00000001`00bff0b8)+0x211b [/builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp @ 1175] 
17 (Inline Function) --------`-------- xul!NS_ProcessNextEvent+0x32 [/builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp @ 467] 
18 00000001`00bfefc0 00007fff`f9ac548f xul!mozilla::ipc::MessagePump::Run(class base::MessagePump::Delegate * aDelegate = 0x00000001`00bff560)+0x2d3 [/builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp @ 107] 
19 (Inline Function) --------`-------- xul!MessageLoop::RunInternal+0x16 [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 331] 
1a 00000001`00bff1c0 00007fff`f91a55ce xul!MessageLoop::RunHandler(void)+0x2f [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 325] 
1b 00000001`00bff210 00007fff`f9315598 xul!MessageLoop::Run(void)+0x4e [/builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc @ 307] 
1c 00000001`00bff270 00007fff`f931449f xul!nsBaseAppShell::Run(void)+0x28 [/builds/worker/checkouts/gecko/widget/nsBaseAppShell.cpp @ 139] 

Unfortunately, the value of pVarChild isn't available in the dump, which is really unfortunate because it might help figure out what's going on here.

Ah. I think I've figured out what's going on here. This is not a new bug - I was able to reproduce it in Firefox 89 - but it is Fission specific, which might explain the recent spike as Fission got shipped to more users on release.

STR (with the NVDA screen reader):

  1. Ensure Fission is enabled.
  2. Open this URL (all we need is an OOP iframe):
    data:text/html,<iframe src="https://www.google.com/nf">
  3. Ensure the top level document has focus.
  4. Open the NVDA python console with NVDA+control+z.
  5. Enter this command:
    focus.getChild(0).IAccessibleObject.accRole(1)

The issue is that for OOP iframes, due to the sandbox, we don't have an actual COM proxy for the embedded document, but rather, a fake COM passthru proxy. Methods can't actually be called on this proxy, but when COM unmarshals it for the client, the client ends up with a valid proxy. In this particular case, we try to call methods on that proxy and much pain ensues.

I can fix the crash, but only by returning failure to the client. Due to the sandbox, there's no way I can return what the client really wants here. Still, that's far better than a crash.

This problem is solved properly by our multi-process a11y re-architecture (bug 1694563), but that's months away from shipping.

See Also: → 1737193
Assignee: nobody → jteh
Status: NEW → ASSIGNED

Enter this command [into the NVDA console]:
focus.getChild(0).IAccessibleObject.accRole(1)

What does that do? Are you talking to the accessibility device itself here? Given the crashes clearly users can do things that replicate that step, but could the page content itself manipulate focus or other events to cause Firefox or the accessibility device to replicate this on purpose?

Mistaking one object for another is a more fundamentally exploitable vulnerability than a UAF -- a UAF is a stepping stone to getting memory reallocated so that the program mistakes one object for another :-)

Depends on: a11y-ctw
Flags: needinfo?(jteh)
Keywords: sec-moderate

(In reply to Daniel Veditz [:dveditz] from comment #7)

focus.getChild(0).IAccessibleObject.accRole(1)

What does that do? Are you talking to the accessibility device itself here? Given the crashes clearly users can do things that replicate that step, but could the page content itself manipulate focus or other events to cause Firefox or the accessibility device to replicate this on purpose?

It simulates the client behaviour that triggers this. Clearly, some clients call IAccessible methods on child ids without first trying accChild to get the underlying child object. While a little strange, doing that is acceptable according to the spec.

Mistaking one object for another is a more fundamentally exploitable vulnerability than a UAF

Fair. However, this isn't really about mistaking one object for another. We did actually allocate the object and we're still holding it. The issue is that the object is a stub. It has null pointers for all its vtable entries. Calling methods on it will crash.

That said, I don't really understand why we get the exception we do. I would have expected an access violation since we're trying to execute a null pointer.

Flags: needinfo?(jteh)

Comment on attachment 9259051 [details]
Bug 1749274: Don't return an OOP iframe COM proxy in MsaaAccessible::ResolveChild.

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Triggering the code path this patch changes is fairly straightforward. However, this is not about calling an incorrect object, but rather, we were calling a function in a vtable containing null pointers.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: Anything with Fission enabled. I guess that means 96 and 97. I'm not sure if users can flip the fission pref in eSR 91.
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: This patch should apply without modification to 96 and 97. ESR 91 will require a different, uglier patch, but there shouldn't be any additional risk.
  • How likely is this patch to cause regressions; how much testing does it need?: Very unlikely. The crash is easy to reproduce and the patch is targeted specifically at that code path.
Attachment #9259051 - Flags: sec-approval?

Comment on attachment 9259051 [details]
Bug 1749274: Don't return an OOP iframe COM proxy in MsaaAccessible::ResolveChild.

Approved to land and request uplift

Attachment #9259051 - Flags: sec-approval? → sec-approval+
Group: dom-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch

Fission isn't supported on ESR91, so I think we can just worry about uplifting this to Beta. Please request approval when you get a chance. It grafts cleanly as-landed.

Flags: needinfo?(jteh)

Comment on attachment 9259051 [details]
Bug 1749274: Don't return an OOP iframe COM proxy in MsaaAccessible::ResolveChild.

Beta/Release Uplift Approval Request

  • User impact if declined: Crashes, possible security issue.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Very specifically targeted at the failure code path. Just returns an error to the caller to prevent it.
  • String changes made/needed:
Flags: needinfo?(jteh)
Attachment #9259051 - Flags: approval-mozilla-beta?

Comment on attachment 9259051 [details]
Bug 1749274: Don't return an OOP iframe COM proxy in MsaaAccessible::ResolveChild.

Approved for 97.0b9.

Attachment #9259051 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Whiteboard: [post-critsmash-triage]
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main97+r]
No longer depends on: a11y-ctw
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: