Closed Bug 1523526 Opened 7 years ago Closed 6 years ago

COM interceptor causes content process to crash on AARCH64

Categories

(Core :: IPC: MSCOM, defect, P1)

ARM64
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox69 --- fixed

People

(Reporter: Jamie, Assigned: away)

References

Details

(Keywords: access)

Attachments

(1 file)

STR (with NVDA or Narrator):
0. Build Firefox locally with this in your mozconfig: ac_add_options --enable-accessibility

  1. Start Firefox.
  2. Load a content document; e.g. https://mozilla.org/
    Result: Content process crash!

Accessibility works just fine in the parent process, both the chrome and documents like about:support. (Note that NVDA browse mode won't work because NVDA doesn't have in-process dlls for ARM64 yet.) However, as soon as a content process tries to handle an a11y request, it crashes.

Stack:

(215c.26d0): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
ntdll!LdrpValidateUserCallTarget+0x10:
00007ffe`80d14f24 38716a11 ldrb        wip1,[xip0,xip1]
2:158> kp
 # Child-SP          RetAddr           Call Site
00 0000007a`e527ccd0 00007ffe`80b48684 ntdll!LdrpValidateUserCallTarget+0x10
01 0000007a`e527ccd0 00007ffe`80b486d4 ole32!COMInterceptorCheckICall(void)+0x2c [com\ole32\com\txf\callframe\arm64\stubless.asm @ 1112] 
02 0000007a`e527cd20 00007ffe`800a3de4 ole32!COMInterceptor(void)+0x38 [com\ole32\com\txf\callframe\arm64\stubless.asm @ 1188] 
03 0000007a`e527cd90 00007ffe`800d1160 RPCRT4!Invoke+0x64
04 0000007a`e527cdb0 00007ffe`800d16b0 RPCRT4!InvokeHelper+0x130
05 0000007a`e527cee0 00007ffe`800a856c RPCRT4!NdrStubCall2+0x3b0
06 0000007a`e527d550 00007ffe`7fda7848 RPCRT4!NdrStubCall3+0xfc
07 0000007a`e527d5a0 00007ffe`800ade1c combase!CStdStubBuffer_Invoke(struct IRpcStubBuffer * This = 0x000001bd`1b41dcc0, struct tagRPCOLEMESSAGE * prpcmsg = 0x000001bd`1b4b3e48, struct IRpcChannelBuffer * pRpcChannelBuffer = 0x000001bd`1b41dfb0)+0xa8 [onecore\com\combase\ndr\ndrole\stub.cxx @ 1446] 
08 0000007a`e527d5f0 00007ffe`7fd5bfa0 RPCRT4!CStdStubBuffer_Invoke+0x4c
09 (Inline Function) --------`-------- combase!InvokeStubWithExceptionPolicyAndTracing::__l6::<lambda_76d9e92c799d246a4afbe64a2bf5673d>::operator()+0x2c [onecore\com\combase\dcomrem\channelb.cxx @ 1907] 
0a 0000007a`e527d620 00007ffe`7fd5bbd8 combase!ObjectMethodExceptionHandlingAction<<lambda_76d9e92c799d246a4afbe64a2bf5673d> >(class InvokeStubWithExceptionPolicyAndTracing::__l6::<lambda_76d9e92c799d246a4afbe64a2bf5673d> action = class InvokeStubWithExceptionPolicyAndTracing::__l6::<lambda_76d9e92c799d246a4afbe64a2bf5673d>, struct ObjectMethodExceptionHandlingInfo * pExceptionHandlingInfo = 0x0000007a`e527d6e0, struct ExceptionHandlingResult * pExceptionHandlingResult = 0x0000007a`e527d688)+0x58 [onecore\com\combase\dcomrem\excepn.hxx @ 91] 
0b (Inline Function) --------`-------- combase!InvokeStubWithExceptionPolicyAndTracing+0xc0 [onecore\com\combase\dcomrem\channelb.cxx @ 1905] 
0c 0000007a`e527d680 00007ffe`7fd59f4c combase!DefaultStubInvoke(bool bIsAsyncBeginMethod = false, struct IServerCall * pServerCall = 0x000001bd`1ed11338, struct IRpcChannelBuffer * pChannel = 0x000001bd`1b41dfb0, struct IRpcStubBuffer * pStub = 0x000001bd`1b41dcc0, unsigned long * pdwFault = 0x0000007a`e527dbf8)+0x290 [onecore\com\combase\dcomrem\channelb.cxx @ 1974] 
0d (Inline Function) --------`-------- combase!SyncStubCall::Invoke+0x1c [onecore\com\combase\dcomrem\channelb.cxx @ 2031] 
0e (Inline Function) --------`-------- combase!SyncServerCall::StubInvoke+0x1c [onecore\com\combase\dcomrem\servercall.hpp @ 807] 
0f (Inline Function) --------`-------- combase!StubInvoke+0x20c [onecore\com\combase\dcomrem\channelb.cxx @ 2257] 
10 0000007a`e527d7e0 00007ffe`7fd585d4 combase!ServerCall::ContextInvoke(struct tagRPCOLEMESSAGE * pMessage = 0x000001bd`1b4b3e48, struct IRpcStubBuffer * pStub = 0x000001bd`1b41dcc0, class CServerChannel * pChannel = 0x000001bd`1b41dfb0, struct tagIPIDEntry * pIPIDEntry = 0x00007ffe`80d31824, unsigned long * pdwFault = 0x0000007a`e527dbf8)+0x38c [onecore\com\combase\dcomrem\ctxchnl.cxx @ 1542] 
11 (Inline Function) --------`-------- combase!CServerChannel::ContextInvoke+0x88 [onecore\com\combase\dcomrem\ctxchnl.cxx @ 1438] 
12 (Inline Function) --------`-------- combase!DefaultInvokeInApartment+0x94 [onecore\com\combase\dcomrem\callctrl.cxx @ 3549] 
13 0000007a`e527dbd0 00007ffe`7fd57ad4 combase!AppInvoke(class ServerCall * pServerCall = 0x000001bd`1ed11310, class CServerChannel * pChannel = 0x000001bd`1b41dfb0, struct IRpcStubBuffer * pStub = 0x000001bd`1b41dcc0, void * pv = 0x000001bd`1b463208, void * pStubBuffer = 0x000001bd`1b54f358, struct tagIPIDEntry * pIPIDEntry = 0x000001bd`1b45ccb0, union WireLocalThis * pLocalb = 0x00000000`00000020)+0x324 [onecore\com\combase\dcomrem\channelb.cxx @ 1700] 
14 0000007a`e527dd80 00007ffe`7fd55ad4 combase!ComInvokeWithLockAndIPID(class ServerCall * pServerCall = 0x000001bd`1ed11310, struct tagIPIDEntry * pIPIDEntry = 0x000001bd`1b45ccb0, bool * pbCallerResponsibleForRequestMessageCleanup = 0x0000007a`e527e2e8)+0xdb4 [onecore\com\combase\dcomrem\channelb.cxx @ 2809] 
15 0000007a`e527e280 00007ffe`800ce918 combase!ThreadInvoke(struct _RPC_MESSAGE * pMessage = 0x000001bd`1ec0b6e0)+0x27f4 [onecore\com\combase\dcomrem\channelb.cxx @ 7348] 
16 0000007a`e527f1c0 00007ffe`800cd7cc RPCRT4!DispatchToStubInCNoAvrf+0x38
17 0000007a`e527f210 00007ffe`800ce380 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0x17c
18 0000007a`e527f3a0 00007ffe`800f18c8 RPCRT4!RPC_INTERFACE::DispatchToStubWithObject+0x170
19 0000007a`e527f420 00007ffe`800f1c38 RPCRT4!LRPC_SCALL::DispatchRequest+0x2b0
1a 0000007a`e527f4d0 00007ffe`800efaa8 RPCRT4!LRPC_SCALL::HandleRequest+0x278
1b 0000007a`e527f5f0 00007ffe`800ed40c RPCRT4!LRPC_SASSOCIATION::HandleRequest+0x230
1c 0000007a`e527f6a0 00007ffe`800ee1ec RPCRT4!LRPC_ADDRESS::HandleRequest+0x144
1d 0000007a`e527f740 00007ffe`800c6524 RPCRT4!LRPC_ADDRESS::ProcessIO+0x254
1e 0000007a`e527f880 00007ffe`80d3ed04 RPCRT4!LrpcIoComplete+0xb4
1f 0000007a`e527f910 00007ffe`80d3da98 ntdll!TppAlpcpExecuteCallback+0x1b4
20 0000007a`e527f970 00007ffe`80a55ba4 ntdll!TppWorkerThread+0x3c8
21 0000007a`e527fc00 00007ffe`80d824a4 KERNEL32!BaseThreadInitThunk+0x34
22 0000007a`e527fc40 00000000`00000000 ntdll!RtlUserThreadStart+0x44

I guess interceptor creation succeeds but then there's a crash when something tries to call it? Concerningly, this doesn't seem to even reach our code. I would have expected it to at least reach our OnCall implementation. We register the sink using COM - there's no API hooking or anything like that here - so I don't see how it could have an invalid target for the interceptor.

Any ideas, Aaron? I can provide a minidump if that helps.

Flags: needinfo?(aklotz)

Interesting; this exception is raised by Control Flow Guard (CFG), which is validating that the target address of an executable pointer is considered "valid."

I probably need to get somebody to finally send me a test machine so that I can live debug this in any case. I will look into that.

As to whose pointer is failing validation, that probably needs a good old fashioned minidump. Jamie, please send one my way! Thanks!

Flags: needinfo?(aklotz) → needinfo?(jteh)

Emailed a minidump.

Flags: needinfo?(jteh)

Um... I don't know what changed (some other fix in central? A Windows update?), but I just built against latest central with accessibility enabled and everything works as expected, no crash. The handler works, too (I registered it manually; haven't tested the installer). My NVDA changes mean that NVDA even renders content documents just as expected.

Closing as worksforme, though the reason this now works (or didn't work before) is a fascinating mystery.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME

I... don't know what to say, other than I'm happy that this is now working!

I believe what happened is that Jamie's original repro was with an MSVC build, which had CFG enabled. In bug 1512822 on January 24th, we switched aarch64 builds to clang-cl, which did not enable CFG, so the crash went away. I'm now attempting to re-enable CFG on aarch64 builds, and I hit this exact crash.

For extra complexity: this also seems to have been fixed in Windows version 1809. I've asked several people to test my try build today, and the 1803 machines do repro the crash, while the 1809 machines don't. In particular, there was one machine that was upgraded today, which crashed just before the update and now doesn't.

Blocks: 1526443
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---

With help from Microsoft we narrowed this down to a bug in ole32.dll in arm64 Windows versions before 1809. I haven't heard whether there will be a fix issued for those older versions, but even if there is, we can't assume that everyone will have the update. As a workaround we can check the OS version and use process mitigation flags to disable CFG on child processes where needed.

Jamie, (assuming you haven't yet updated to 1809) would you mind testing out this build? https://treeherder.mozilla.org/#/jobs?repo=try&revision=e2430f0c6a12

Flags: needinfo?(jteh)

I'm afraid I'm already on 1809. At least I now know why this suddenly started working for me (see comment 4).

Flags: needinfo?(jteh)

Ah, that's too bad, I was hoping that the reason it started working was that you picked up a clang-cl build at the time.

(In reply to David Major [:dmajor] from comment #9)

Ah, that's too bad, I was hoping that the reason it started working was that you picked up a clang-cl build at the time.

Ug. I just re-read comment 6; that's entirely possible. However, I've definitely updated to 1809 now; I don't remember precisely when. :(

Assignee: nobody → dmajor
Priority: -- → P1
Pushed by dmajor@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e8ac4b512f9d Don't allow CFG on old releases of Windows for arm64 r=bobowen,aklotz

Backed out 2 changesets (bug 1523526, bug 1526443) for Be bustage on Windows AArch

Backout: https://hg.mozilla.org/integration/autoland/rev/8fe96c8d21b7a9fdebdf3300de2367f2cc9232c3

Failure push: uhttps://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunnable&group_state=expanded&revision=98013639d60026eb3a0344fa499a321aec176d2b

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=247379721&repo=autoland&lineNumber=930

16:52:09 INFO - MOZILLA_OFFICIAL=1
16:52:09 INFO - CFLAGS= -fcrash-diagnostics-dir=z:/task_1558369429/public/build
16:52:09 INFO - WIN32_REDIST_DIR=z:/task_1558369429/build/src/vs2017_15.9.6/VC/redist/arm64/Microsoft.VC141.CRT
16:52:09 INFO - VSPATH=/z/task_1558369429/build/src/vs2017_15.9.6
16:52:09 INFO - TOOLTOOL_DIR=z:/task_1558369429/build/src
16:52:09 INFO - MOZ_REQUIRE_SIGNING=0
16:52:09 INFO - DIAGNOSTICS_DIR=/z/task_1558369429/public/build
16:52:09 INFO - VSWINPATH=z:/task_1558369429/build/src/vs2017_15.9.6
16:52:09 INFO - NO_CACHE=1
16:52:09 INFO - MOZ_AUTOMATION_ARTIFACT_BUILDS=1
16:52:09 INFO - checking for vcs source checkout... hg
16:52:09 INFO - checking for a shell... C:/mozilla-build/msys/bin/sh.exe
16:52:09 INFO - checking for host system type... x86_64-pc-mingw32
16:52:09 INFO - checking for target system type... aarch64-windows-mingw32
16:52:09 INFO - checking whether cross compiling... yes
16:52:09 ERROR - Traceback (most recent call last):
16:52:09 INFO - File "z:/task_1558369429/build/src/configure.py", line 132, in <module>
16:52:09 INFO - sys.exit(main(sys.argv))
16:52:09 INFO - File "z:/task_1558369429/build/src/configure.py", line 38, in main
16:52:09 INFO - sandbox.run(os.path.join(os.path.dirname(file), 'moz.configure'))
16:52:09 INFO - File "z:\task_1558369429\build\src\python\mozbuild\mozbuild\configure_init_.py", line 441, in run
16:52:09 INFO - self.value_for(option)
16:52:09 INFO - File "z:\task_1558369429\build\src\python\mozbuild\mozbuild\configure_init.py", line 528, in _value_for
16:52:09 INFO - return self.value_for_option(obj)
16:52:09 INFO - File "z:\task_1558369429\build\src\python\mozbuild\mozbuild\util.py", line 947, in method_call
16:52:09 INFO - cache[args] = self.func(instance, *args)
16:52:09 INFO - File "z:\task_1558369429\build\src\python\mozbuild\mozbuild\configure_init.py", line 591, in _value_for_option
16:52:09 INFO - % option_string.split('=', 1)[0])
16:52:09 INFO - mozbuild.configure.options.InvalidOptionError: --enable-hardening is not available in this configuration
16:52:09 INFO - *** Fix above errors and then restart with
16:52:09 INFO - "./mach build"
16:52:09 INFO - client.mk:111: recipe for target 'configure' failed
16:52:09 INFO - mozmake.EXE: *** [configure] Error 1
16:52:10 ERROR - Return code: 2
16:52:10 WARNING - setting return code to 2
16:52:10 FATAL - 'mach build -v' did not run successfully. Please check log for errors.
16:52:10 FATAL - Running post_fatal callback...
16:52:10 FATAL - Exiting -1
16:52:10 INFO - [mozharness: 2019-05-20 16:52:10.017000Z] Finished build step (failed)
16:52:10 INFO - Running post-run listener: _parse_build_tests_ccov
16:52:10 INFO - Running post-run listener: _shutdown_sccache
16:52:10 INFO - Running post-run listener: _summarize
16:52:10 ERROR - # TBPL FAILURE #
16:52:10 INFO - [mozharness: 2019-05-20 16:52:10.018000Z] FxDesktopBuild summary:
16:52:10 ERROR - # TBPL FAILURE #

Flags: needinfo?(dmajor)
Flags: needinfo?(dmajor)
Pushed by dmajor@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/65f1f6d80936 Don't allow CFG on old releases of Windows for arm64 r=bobowen,aklotz
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: