Crash in [@ mozilla::interceptor::FuncHook<T>::operator()] - In-process interception race condition
Categories
(Core :: DLL Services, defect, P3)
Tracking
()
People
(Reporter: gsvelto, Assigned: yannis)
References
Details
(Keywords: crash, Whiteboard: [win:stability])
Crash Data
Attachments
(2 files)
|
9.66 KB,
text/plain
|
Details | |
|
48 bytes,
text/x-phabricator-request
|
diannaS
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr115+
|
Details | Review |
Crash report: https://crash-stats.mozilla.org/report/index/ece269a7-d742-4085-843b-eae8f0230613
Reason: EXCEPTION_STACK_BUFFER_OVERRUN / FAST_FAIL_GUARD_ICALL_CHECK_FAILURE
Top 8 frames of crashing thread:
0 ntdll.dll LdrpICallHandler
1 ntdll.dll RtlpExecuteHandlerForException
2 ntdll.dll RtlDispatchException
3 ntdll.dll KiUserExceptionDispatch
4 ntdll.dll LdrpDispatchUserCallTarget
5 mozglue.dll mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>, void const toolkit/xre/dllservices/mozglue/nsWindowsDllInterceptor.h:150
5 mozglue.dll patched_BaseThreadInitThunk toolkit/xre/dllservices/mozglue/WindowsDllBlocklist.cpp:620
6 ntdll.dll RtlUserThreadStart
This is a low-volume crash but with consistent stacks and reasons. It's intercepted by WER since it's triggered by a __fastfail() call. It's unclear what's going on in the crashing thread but in several crashes there's always another thread doing a call to KERNEL.DLL which is often FlushInstructionCache(). Could this be some kind of race in the interceptor?
Updated•2 years ago
|
| Assignee | ||
Comment 1•2 years ago
•
|
||
This is caused by a race condition in our interception code indeed. The trampoline address mOrigFunc is set after we alter the code of the hooked function (see attachment). So if between the two writes, the hooked function gets called on another thread and we reach the part of our patched function that calls the trampoline, we will crash because mOrigFunc still has its nullptr initial value. This seems to be most commonly happening for BaseThreadInitThunk, for which a thread starting between the two writes would result in crash. The crash stack is a bit obscure, but that is because we crash as part of a CFG check failure (which are guarded to force crashing without any possible handling of the exception), rather than a regular null pointer dereference.
| Comment hidden (obsolete) |
| Assignee | ||
Updated•2 years ago
|
| Assignee | ||
Comment 3•2 years ago
|
||
Moving the race condition from comment 2 to separate bug 1862920 because these issues can be fixed independently.
| Assignee | ||
Comment 4•2 years ago
|
||
If a thread starts running a detoured function right after we
succesfully commited our 13-bytes patch, there is a short delay where it
can reach the patched_XXX function and try to call stub_XXX while
stub_XXX.mOrigFunc is still a null pointer.
We fix this specific race condition, which materializes mostly as
crashes in patched_BaseThreadInitThunk when trying to call
stub_BaseThreadInitThunk in the current code base.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 6•2 years ago
|
||
| bugherder | ||
| Assignee | ||
Updated•2 years ago
|
| Assignee | ||
Updated•2 years ago
|
| Assignee | ||
Comment 7•2 years ago
|
||
Comment on attachment 9361792 [details]
Bug 1838286 - Fix a race condition with in-process DLL interception. r=handyman,#win-reviewers
Beta/Release Uplift Approval Request
- User impact if declined: Crashes, coming most likely from users of third-party products (in particular, antivirus) that create threads early in the Firefox process.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Very simple stability patch that performs a memory write earlier than before. Does not alter behavior, except in the race condition scenario that we intend to fix.
- String changes made/needed:
- Is Android affected?: No
Comment 8•2 years ago
|
||
Comment on attachment 9361792 [details]
Bug 1838286 - Fix a race condition with in-process DLL interception. r=handyman,#win-reviewers
Approved for 120.0b8
Updated•2 years ago
|
Comment 10•2 years ago
|
||
Please nominate this for ESR115 approval when you get a chance.
| Assignee | ||
Comment 11•2 years ago
|
||
Comment on attachment 9361792 [details]
Bug 1838286 - Fix a race condition with in-process DLL interception. r=handyman,#win-reviewers
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: 115 branch is affected and the fix is simple with very low risk.
- User impact if declined: Crashes, coming most likely from users of third-party products (in particular, antivirus) that create threads early in the Firefox process.
- Fix Landed on Version: 120
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Very simple stability patch that performs a memory write earlier than before. Does not alter behavior, except in the race condition scenario that we intend to fix. Verified in 120.
Comment 12•2 years ago
|
||
Comment on attachment 9361792 [details]
Bug 1838286 - Fix a race condition with in-process DLL interception. r=handyman,#win-reviewers
Approved for 115.6esr.
Updated•2 years ago
|
Comment 13•2 years ago
|
||
| uplift | ||
Description
•