Closed Bug 1705579 Opened 1 month ago Closed 12 days ago

Crash in [@ je_malloc | nsTSubstring<T>::Assign | (anonymous namespace)::WinIOAutoObservation::WinIOAutoObservation]

Categories

(Core :: Gecko Profiler, defect)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
90 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox88 --- wontfix
firefox89 --- wontfix
firefox90 --- fixed

People

(Reporter: mccr8, Assigned: toshi)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/bdcfebf3-9282-4fb0-9c4d-bf1590210415

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top 10 frames of crashing thread:

0 mozglue.dll je_malloc memory/build/malloc_decls.h:51
1 xul.dll nsTSubstring<char16_t>::Assign xpcom/string/nsTSubstring.cpp:455
2 xul.dll `anonymous namespace'::WinIOAutoObservation::WinIOAutoObservation xpcom/build/PoisonIOInterposerWin.cpp:148
3 xul.dll `anonymous namespace'::InterposedNtCreateFile xpcom/build/PoisonIOInterposerWin.cpp:266
4 kernelbase.dll CreateFileInternal 
5 kernelbase.dll CreateFileW 
6 advapi32.dll SaferpIsV2PolicyPresent 
7 advapi32.dll SaferiIsDllAllowed 
8 ntdll.dll LdrpMapDllNtFileName 
9 ntdll.dll LdrpMapDllSearchPath 

Kind of an odd stack. Looking at a few reports, the crash is on a thread that is low numbered, and looks like it is purely internal Windows stuff, until it tries to create a file and hits our IO interposer, which tries to allocate a string, then hits a null deref and crashes. Maybe this is some thread where jemalloc isn't set up to run, assuming that it requires TLS or something else?

Looking at thread 0 in these crashes, they were all waiting on a condition variable in the script preloader. Maybe in the dtor for AutoBeginReading?

Maybe something is going horribly wrong with the script preloader thread?

I'll file this in jemalloc for now because that's where the crash is happening, though it seems unlikely that it is the ultimate cause.

Based on the sparse crash report, I think this is one of the new Windows crash reporting crashes.

All the crash reports have the last error value in the crashing thread set to 0x5. That's an ERROR_ACCESS_DENIED code in Windows. It's possible that a previous operation is failing and the crash is a consequence of this failure.

Just found another signature, same story, the last error value is always set to 0x5.

Crash Signature: [@ je_malloc | nsTSubstring<T>::Assign | (anonymous namespace)::WinIOAutoObservation::WinIOAutoObservation] → [@ je_malloc | nsTSubstring<T>::Assign | (anonymous namespace)::WinIOAutoObservation::WinIOAutoObservation] [@ Allocator<T>::malloc | replace_malloc]

I've cracked open a couple of minidumps to figure out what's going on and it's rather odd:

  • The cause for the crash is that we're trying to access thread-local storage (in the memory allocator) and it's happening at a point in time where thread-local storage might yet be unavailable (it seems like we're very early during startup).
  • This being said I don't think that's the real problem. Looking at the stack we were deep into a Microsoft-only chunk of code which called CreateFileW() with the following path: C:\WINDOWS\System32\AppLocker\MDM and I believe that call failed, hence the ERROR_ACCESS_DENIED value returned by GetLastError() which we recorded in the minidump (note this value is the same in all crash reports)
  • The I/O interposer tried to record that path and that's what triggered that allocation

Toshihito, I'm not familiar with how the I/O interposer is supposed to work, can you help me figure out what's going on here?

Flags: needinfo?(tkikuchi)

The crash happened because TLS was not allocated when WinIOAutoObservation tried to hook ntdll's functions. It was not process or thread's early time. The crashing thread started from ntdll!RtlUserThreadStart, but normally TLS should have been allocated even before ntdll!RtlUserThreadStart. This case is an exception because the crashing thread is DLL loader's worker thread. As shown below, ntdll!LdrpInitializeThread skips to allocate TLS in that case.

0:000> dt nt!_TEB LoaderWorker
ntdll!_TEB
   +0x17ee LoaderWorker : Pos 13, 1 Bit

ntdll!LdrpInitializeThread+0x57:
00007ffc`948c2d3f b800200000      mov     eax,2000h
00007ffc`948c2d44 66418582ee170000 test    word ptr [r10+17EEh],ax
00007ffc`948c2d4c 0f8590010000    jne     ntdll!LdrpInitializeThread+0x1fa (00007ffc`948c2ee2) <<<< Skip if LoaderWorker is on
00007ffc`948c2d52 e8cd300300      call    ntdll!LdrpAllocateTls (00007ffc`948f5e24)
00007ffc`948c2d57 8bd8            mov     ebx,eax

Currently this is not a major situation because no module loaded in DLL loader's worker thread triggers file operations. If a DLL rule in AppLocker is defined, however, the loader issues fie operations, resulting in crash.

The simplest solution would be to skip WinIOAutoObservation if TLS is not available, sacrificing I/O performance data we collect. I came up with an idea to simulate this situation in a test.

Flags: needinfo?(tkikuchi)
Assignee: nobody → tkikuchi
Component: Memory Allocator → Gecko Profiler
See Also: → 1666310

We hook several file APIs to record I/O performance data. Since TLS is not
allocated in ntdll's loader worker thread, however, if someone does a file
operation, we hit read AV because WinIOAutoObservation uses nsString and
a thread local variable.

Currently we can see this crash happens only when a DLL rule of AppLocker is
defined, but theoretically this can happen when any module loaded in a worker
thread does file operation in its entrypoint.

The proposed fix is to skip WinIOAutoObservation if TLS is not available.

Pushed by tkikuchi@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/92f69679fcb9
Skip WinIOAutoObservation if TLS is not available.  r=gerald,aklotz
Status: NEW → RESOLVED
Closed: 12 days ago
Resolution: --- → FIXED
Target Milestone: --- → 90 Branch
You need to log in before you can comment on or make changes to this bug.