Open Bug 1706031 Opened 4 years ago Updated 8 months ago

Crash in [@ RtlAcquireSRWLockExclusive | (anonymous namespace)::InterposedNtCreateFile] with Qihoo antivirus

Categories

(External Software Affecting Firefox :: Other, defect, P2)

Unspecified
Windows 10

Tracking

(firefox103 wontfix)

Tracking Status
firefox103 --- wontfix

People

(Reporter: gsvelto, Assigned: handyman)

References

Details

(Keywords: crash, Whiteboard: [win:stability])

Crash Data

Crash report: https://crash-stats.mozilla.org/report/index/f2e9bb3f-4bf5-4a07-a837-f547a0210419

Reason: EXCEPTION_ACCESS_VIOLATION_WRITE

Top 8 frames of crashing thread:

0 ntdll.dll RtlAcquireSRWLockExclusive 
1 ntdll.dll SbSelectProcedure 
2 kernelbase.dll GetFileType 
3 xul.dll `anonymous namespace'::InterposedNtCreateFile xpcom/build/PoisonIOInterposerWin.cpp:278
4 kernelbase.dll CreateFileInternal 
5 kernelbase.dll CreateFileW 
6 libzdtp64.dll libzdtp64.dll@0x56fa 
7 dbgcore.dll long DetermineOutputProvider 

A couple of crash signatures under this one. I couldn't find much information about this A/V safe for its web page here: https://www.360.cn/

Found another signature

Crash Signature: [@ RtlAcquireSRWLockExclusive | SbSelectProcedure] [@ RtlAcquireSRWLockExclusive | (anonymous namespace)::InterposedNtCreateFile] → [@ RtlAcquireSRWLockExclusive | GetFileType] [@ RtlAcquireSRWLockExclusive | SbSelectProcedure] [@ RtlAcquireSRWLockExclusive | (anonymous namespace)::InterposedNtCreateFile]
See Also: → 1719212
See Also: → 1743265

Cleaning up the signatures that are not occurring recently. Not only Qihoo but also Malwarebytes causes a conflict with our IOInterposer. This is similar to bug 1679741. We may be able to implement a workaround.

Assignee: nobody → tkikuchi
Severity: -- → S3
Crash Signature: [@ RtlAcquireSRWLockExclusive | GetFileType] [@ RtlAcquireSRWLockExclusive | SbSelectProcedure] [@ RtlAcquireSRWLockExclusive | (anonymous namespace)::InterposedNtCreateFile] → [@ RtlAcquireSRWLockExclusive | (anonymous namespace)::InterposedNtCreateFile]
Priority: -- → P3
See Also: 1719212, 17432651646804, 1679741
Summary: Qihoo 360 A/V crashes Firefox in [@ RtlAcquireSRWLockExclusive | SbSelectProcedure] → Crash in [@ RtlAcquireSRWLockExclusive | (anonymous namespace)::InterposedNtCreateFile]

There is a spike of crashes on nightly with this signature since build 20220531065724, just after the 103 merge.
Changelog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=f5c06f04952fc5d22644704c505a956834ed65c5&tochange=3faedf38e2a0dd88fff7c89d750cbf61eecc601b

These are with the Razor Cortex dlls in the stack (k_fps32.dll and k_fps64.dll). Could this be related to the change in bug 1739114? These are startup crashes and the extensions list in the crash report is missing.

Crash Signature: [@ RtlAcquireSRWLockExclusive | (anonymous namespace)::InterposedNtCreateFile] → [@ RtlAcquireSRWLockExclusive | (anonymous namespace)::InterposedNtCreateFile] [@ mozilla::detail::MutexImpl::lock | (anonymous namespace)::InterposedNtCreateFile] [@ ntdll.dll | kernelbase.dll | (anonymous namespace)::InterposedNtCreateFile]
Flags: needinfo?(mixedpuppy)

Changing the priority to P2 as the bug is tracked by a release manager for the current nightly.
See Triage for Bugzilla for more information.
If you disagree, please discuss with a release manager.

Priority: P3 → P2

I found this crash that looks to be the same crash but got more info as the stacks for the others have a lot of garbage. From this minidump, it looks like these may actually be shutdown crashes:

 	ntdll.dll!_RtlAcquireSRWLockExclusive@4()	Unknown	Non-user code. Symbols loaded.
>	mozglue.dll!mozilla::detail::MutexImpl::lock() Line 23	C++	Symbols loaded.
 	[Inline Frame] xul.dll!mozilla::OffTheBooksMutex::Lock() Line 65	C++	Symbols loaded.
 	[Inline Frame] xul.dll!mozilla::detail::BaseAutoLock<mozilla::OffTheBooksMutex &>::BaseAutoLock(mozilla::OffTheBooksMutex & aLock) Line 236	C++	Symbols loaded.
 	[Inline Frame] xul.dll!mozilla::SmallArrayLRUCache<void *,nsTString<char16_t>,32>::Add(void * aKey, nsTString<char16_t> & aValue) Line 66	C++	Symbols loaded.
 	[Inline Frame] xul.dll!`anonymous namespace'::WinIOAutoObservation::SetHandle(void * aFileHandle) Line 169	C++	Symbols loaded.
 	xul.dll!`anonymous namespace'::InterposedNtCreateFile(void * * aFileHandle, unsigned long aDesiredAccess, _OBJECT_ATTRIBUTES * aObjectAttributes, _IO_STATUS_BLOCK * aIoStatusBlock, _LARGE_INTEGER * aAllocationSize, unsigned long aFileAttributes, unsigned long aShareAccess, unsigned long aCreateDisposition, unsigned long aCreateOptions, void * aEaBuffer, unsigned long aEaLength) Line 286	C++	Symbols loaded.
 	KERNELBASE.dll!_CreateFileInternal@24()	Unknown	Non-user code. Symbols loaded.
 	KERNELBASE.dll!_CreateFileW@28()	Unknown	Non-user code. Symbols loaded.
 	K_FPS32.dll!7b717bc7()	Unknown	Non-user code. No matching binary found.
 	K_FPS32.dll![Frames below may be incorrect and/or missing, no symbols loaded for K_FPS32.dll]	Unknown	No symbols loaded.
 	K_FPS32.dll!7b717f0e()	Unknown	Non-user code. No matching binary found.
 	K_FPS32.dll!7b717902()	Unknown	Non-user code. No matching binary found.
 	K_FPS32.dll!7b7181a9()	Unknown	Non-user code. No matching binary found.
 	K_FPS32.dll!7b711534()	Unknown	Non-user code. No matching binary found.
 	K_FPS32.dll!7b70babf()	Unknown	Non-user code. No matching binary found.
 	K_FPS32.dll!7b70bb3a()	Unknown	Non-user code. No matching binary found.
 	K_FPS32.dll!7b6f298d()	Unknown	Non-user code. No matching binary found.
 	K_FPS32.dll!7b6f2bdc()	Unknown	Non-user code. No matching binary found.
 	K_FPS32.dll!7b706e5b()	Unknown	Non-user code. No matching binary found.
 	K_FPS32.dll!7b706f2d()	Unknown	Non-user code. No matching binary found.
 	ntdll.dll!_LdrxCallInitRoutine@16()	Unknown	Non-user code. Symbols loaded.
 	ntdll.dll!LdrpCallInitRoutine()	Unknown	Non-user code. Symbols loaded.
 	ntdll.dll!_LdrShutdownProcess@0()	Unknown	Non-user code. Symbols loaded.
 	ntdll.dll!RtlExitUserProcess()	Unknown	Non-user code. Symbols loaded.
 	kernel32.dll!_ExitProcessImplementation@4()	Unknown	Non-user code. Symbols loaded.
 	ucrtbase.dll!exit_or_terminate_process()	Unknown	Non-user code. Symbols loaded.
 	ucrtbase.dll!common_exit()	Unknown	Non-user code. Symbols loaded.
 	ucrtbase.dll!_exit()	Unknown	Non-user code. Symbols loaded.
 	firefox.exe!__scrt_common_main_seh() Line 310	C++	Non-user code. Symbols loaded.
 	kernel32.dll!@BaseThreadInitThunk@12()	Unknown	Non-user code. Symbols loaded.
 	ntdll.dll!___RtlUserThreadStart@8()	Unknown	Non-user code. Symbols loaded.
 	ntdll.dll!__RtlUserThreadStart@8()	Unknown	Non-user code. Symbols loaded.

This is suggesting that the static variable sHandleToFilenameCache has been destroyed atexit before the unloading of K_FPS32.DLL, which is creating a file, which is causing the IO interposer to access the already freed memory. This would not happen if ClearPoisonIOInterposer were used but, as the comment there says, there is a bug that causes us not to do that -- and it is a bit worse now (bug 1769001). But I don't see another solution.

Whiteboard: [win:stability]

The severity field for this bug is set to S3. However, the bug is marked as tracked for firefox103 (nightly).
:toshi, could you consider increasing the severity of this tracked bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(tokikuc)
Assignee: tokikuc → davidp99
Severity: S3 → S2
Flags: needinfo?(tokikuc)

Adding to the info in comment 6: that crash, like the others, is a startup crash -- it says it ran for 6 seconds (the longest I see is 40 seconds). If these are shutdown crashes then something else extreme is happening.

@gcp pointed out that these issues popped up when bug 1770721 landed. It moved crash reporter shutdown to after ProcessRuntime shutdown, so it's not directly related, but the crash signatures that it would repair resemble the one here. Theory is that that bug made this one worse (this crash existed earlier in low volume) or just improved the reliability of its reporting. But none of this explains why we would have shutdown crashes during or right after startup.

I needed to look at the firefox.exe!__scrt_common_main_seh stack frame more closely -- it's not the browser cleanly shutting down, it's running into an unhandled exception. The other hangs stop in similar code but are not in the unhandled-exception handler. The unhandled exception is the "real" bad but the rest of what's happening is as in comment 6 and would require the solution discussed there.

I tried some minidump gymnastics and got 0xC0000005 (Access Violation) for the comment 6 crash exception, but that could just be the exception that's being thrown by the exception handler. It doesn't tell us much anyway, except perhaps that there are cases where access violations aren't dealt with by our exception handler. Probably more likely, given the profile of these crashes which all happen right after launch, is that the failure happens before we even set up our exception handler. The first crash should still be reported to WER but probably isn't because of this crash in the exception handler. So fixing this bug may be the best way to get the information needed for the "real" one.

All the crashes under these signatures have been intercepted by WER so indeed the exception we're seeing is the one that's being generated by failing to handle another one in the exception handler. The increase in volume is certainly caused by us moving the exception handler termination later. I wonder if there might be another signature somewhere that registered a decrease in volume as that exception should have been caught by WER instead. BTW note that whenever the Breakpad exception handler tries to write a minidump it disables the I/O interposer so the failure shouldn't happen at that point (but it can happen before).

(In reply to Gabriele Svelto [:gsvelto] from comment #10)

BTW note that whenever the Breakpad exception handler tries to write a minidump it disables the I/O interposer so the failure shouldn't happen at that point (but it can happen before).

This clarifies a lot of the code for me. For the most part, this should be the case, but it might not be in comment 6. The behavior you describe would have stopped the comment 6 crash here because IOInterposer::IsObservedOperation should be false after Breakpad disables the IO Interposer, but there are two problems with that. One is that IOInterposer::IsObservedOperation is false because the disabling sets sSourceList to null... but this is after main so we don't know if sSourceList itself is still valid memory or has been destroyed. Second is more obvious -- unlike the other methods in the class, WinIOAutoObservation::SetHandle is not conditioned on mShouldReport.

WinIOAutoObservation::SetHandle is not conditioned on mShouldReport.

mShouldReport is true in this crash, so that wouldn't have helped here.

:handyman has there been any updates on the investigation?
The crash rate as dropped but we are still only in nightly at the moment for 103, the soft freeze is next week.

Flags: needinfo?(davidp99)

No, I'm not going to have time to do what this needs. It looks like this is still about ClearPoisonIOInterposer and that's going to take some effort.

Flags: needinfo?(davidp99)

Removing tracking + and setting 103 to Won't Fix, the crash signature is not reported in 103 beta and see comment 14

This is a signature change, the remaining failures are all caused by Qihoo antivirus, the Malwarebytes ones migrated under bug 1683069.

Crash Signature: [@ RtlAcquireSRWLockExclusive | (anonymous namespace)::InterposedNtCreateFile] [@ mozilla::detail::MutexImpl::lock | (anonymous namespace)::InterposedNtCreateFile] [@ ntdll.dll | kernelbase.dll | (anonymous namespace)::InterposedNtCreateFile] → [@ RtlAcquireSRWLockExclusive | (anonymous namespace)::InterposedNtCreateFile] [@ mozilla::detail::MutexImpl::lock | (anonymous namespace)::InterposedNtCreateFile] [@ ntdll.dll | kernelbase.dll | (anonymous namespace)::InterposedNtCreateFile] [@ RtlAcq…
Summary: Crash in [@ RtlAcquireSRWLockExclusive | (anonymous namespace)::InterposedNtCreateFile] → Crash in [@ RtlAcquireSRWLockExclusive | (anonymous namespace)::InterposedNtCreateFile] with Qihoo antivirus

It looks like the only remaining attached signature with crashes [@ RtlAcquireSRWLockExclusive | mozilla::OffTheBooksMutex::Lock | mozilla::detail::BaseAutoLock<T>::BaseAutoLock ] is now not as tied to Qihoo, so the title has become potentially misleading?

Regarding that specific signature, I wonder if we are just seeing here the consequence of a variety of heap buffer overflows, and locks are just the kind of objects that are used so often that it makes it very likely to crash very soon after being corrupted compared to other objects? See for example this crash report where the pointer to the lock object is stored in a static variable.

(In reply to Yannis Juglaret from comment #17)

It looks like the only remaining attached signature with crashes [@ RtlAcquireSRWLockExclusive | mozilla::OffTheBooksMutex::Lock | mozilla::detail::BaseAutoLock<T>::BaseAutoLock ] is now not as tied to Qihoo, so the title has become potentially misleading?

Yes, the signature morphed into a bunch of other different crashes. I need to adjust it and then we're likely going to close this bug.

Regarding that specific signature, I wonder if we are just seeing here the consequence of a variety of heap buffer overflows, and locks are just the kind of objects that are used so often that it makes it very likely to crash very soon after being corrupted compared to other objects? See for example this crash report where the pointer to the lock object is stored in a static variable.

Looking through all the crashes - and in particular the one you pointed out - it seems like we're dealing with two main classes of issues: promises where the lock had been cleared (maybe because they were dead? Or they were launched too early?) and startup crashes. The one you linked in particular is taken by WER - and all other reports like it are also taken by WER - very early during startup. For some reason it seems that the static variable in question had not been initialized yet, which is odd but might be caused by our peculiar architecture (with xul.dll and all). Anyway I'll file a Socorro bug to break apart this crash signature in different components so that we can analyze them one by one.

Adding the new signature for this bug which will be populated once bug 1816846 lands.

Crash Signature: RtlAcquireSRWLockExclusive | mozilla::OffTheBooksMutex::Lock | mozilla::detail::BaseAutoLock<T>::BaseAutoLock] → RtlAcquireSRWLockExclusive | mozilla::OffTheBooksMutex::Lock | mozilla::detail::BaseAutoLock<T>::BaseAutoLock] [@ mozilla::SmallArrayLRUCache<T>::Add]

Alright, all the crashes here will now match the title, and the other unrelated crashes have been split out to other signatures.

Since the crash volume is low (less than 15 per week), the severity is downgraded to S3. Feel free to change it back if you think the bug is still critical.

For more information, please visit auto_nag documentation.

Severity: S2 → S3
See Also: → 1843977

I'm declaring need-info bankruptcy and resetting old ni? on me. If input is still needed on this issue and it is engineering specific for the webextensions team, ni? lgreco, or ni? me again.

Flags: needinfo?(mixedpuppy)
You need to log in before you can comment on or make changes to this bug.