Closed Bug 1838286 Opened 2 years ago Closed 2 years ago

Crash in [@ mozilla::interceptor::FuncHook<T>::operator()] - In-process interception race condition

Categories

(Core :: DLL Services, defect, P3)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
121 Branch
Tracking Status
firefox-esr115 --- fixed
firefox119 --- wontfix
firefox120 --- fixed
firefox121 --- fixed

People

(Reporter: gsvelto, Assigned: yannis)

References

Details

(Keywords: crash, Whiteboard: [win:stability])

Crash Data

Attachments

(2 files)

Crash report: https://crash-stats.mozilla.org/report/index/ece269a7-d742-4085-843b-eae8f0230613

Reason: EXCEPTION_STACK_BUFFER_OVERRUN / FAST_FAIL_GUARD_ICALL_CHECK_FAILURE

Top 8 frames of crashing thread:

0  ntdll.dll  LdrpICallHandler  
1  ntdll.dll  RtlpExecuteHandlerForException  
2  ntdll.dll  RtlDispatchException  
3  ntdll.dll  KiUserExceptionDispatch  
4  ntdll.dll  LdrpDispatchUserCallTarget  
5  mozglue.dll  mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared>, void  const  toolkit/xre/dllservices/mozglue/nsWindowsDllInterceptor.h:150
5  mozglue.dll  patched_BaseThreadInitThunk  toolkit/xre/dllservices/mozglue/WindowsDllBlocklist.cpp:620
6  ntdll.dll  RtlUserThreadStart  

This is a low-volume crash but with consistent stacks and reasons. It's intercepted by WER since it's triggered by a __fastfail() call. It's unclear what's going on in the crashing thread but in several crashes there's always another thread doing a call to KERNEL.DLL which is often FlushInstructionCache(). Could this be some kind of race in the interceptor?

Severity: -- → S3
Priority: -- → P3
Attached file write-order.txt

This is caused by a race condition in our interception code indeed. The trampoline address mOrigFunc is set after we alter the code of the hooked function (see attachment). So if between the two writes, the hooked function gets called on another thread and we reach the part of our patched function that calls the trampoline, we will crash because mOrigFunc still has its nullptr initial value. This seems to be most commonly happening for BaseThreadInitThunk, for which a thread starting between the two writes would result in crash. The crash stack is a bit obscure, but that is because we crash as part of a CFG check failure (which are guarded to force crashing without any possible handling of the exception), rather than a regular null pointer dereference.

Whiteboard: [win:stability]
See Also: → 1862920

Moving the race condition from comment 2 to separate bug 1862920 because these issues can be fixed independently.

Crash Signature: [@ mozilla::interceptor::FuncHook<T>::operator()] [@ BaseThreadInitThunk] → [@ mozilla::interceptor::FuncHook<T>::operator()]
Summary: Crash in [@ mozilla::interceptor::FuncHook<T>::operator()] and [@ BaseThreadInitThunk] - In-process interception race conditions → Crash in [@ mozilla::interceptor::FuncHook<T>::operator()] - In-process interception race condition

If a thread starts running a detoured function right after we
succesfully commited our 13-bytes patch, there is a short delay where it
can reach the patched_XXX function and try to call stub_XXX while
stub_XXX.mOrigFunc is still a null pointer.

We fix this specific race condition, which materializes mostly as
crashes in patched_BaseThreadInitThunk when trying to call
stub_BaseThreadInitThunk in the current code base.

Assignee: nobody → yjuglaret
Status: NEW → ASSIGNED
Attachment #9361792 - Attachment description: Bug 1838286 - Fix a race condition in-process DLL interception. r=handyman,#win-reviewers → Bug 1838286 - Fix a race condition with in-process DLL interception. r=handyman,#win-reviewers
Pushed by yjuglaret@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/23d10245957f Fix a race condition with in-process DLL interception. r=win-reviewers,gstoll
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 121 Branch

Comment on attachment 9361792 [details]
Bug 1838286 - Fix a race condition with in-process DLL interception. r=handyman,#win-reviewers

Beta/Release Uplift Approval Request

  • User impact if declined: Crashes, coming most likely from users of third-party products (in particular, antivirus) that create threads early in the Firefox process.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Very simple stability patch that performs a memory write earlier than before. Does not alter behavior, except in the race condition scenario that we intend to fix.
  • String changes made/needed:
  • Is Android affected?: No
Attachment #9361792 - Flags: approval-mozilla-beta?

Comment on attachment 9361792 [details]
Bug 1838286 - Fix a race condition with in-process DLL interception. r=handyman,#win-reviewers

Approved for 120.0b8

Attachment #9361792 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Please nominate this for ESR115 approval when you get a chance.

Flags: needinfo?(yjuglaret)

Comment on attachment 9361792 [details]
Bug 1838286 - Fix a race condition with in-process DLL interception. r=handyman,#win-reviewers

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: 115 branch is affected and the fix is simple with very low risk.
  • User impact if declined: Crashes, coming most likely from users of third-party products (in particular, antivirus) that create threads early in the Firefox process.
  • Fix Landed on Version: 120
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Very simple stability patch that performs a memory write earlier than before. Does not alter behavior, except in the race condition scenario that we intend to fix. Verified in 120.
Flags: needinfo?(yjuglaret)
Attachment #9361792 - Flags: approval-mozilla-esr115?

Comment on attachment 9361792 [details]
Bug 1838286 - Fix a race condition with in-process DLL interception. r=handyman,#win-reviewers

Approved for 115.6esr.

Attachment #9361792 - Flags: approval-mozilla-esr115? → approval-mozilla-esr115+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: