Closed Bug 1702086 Opened 6 months ago Closed 4 months ago

Crash in [@ SharedLibraryInfo::GetInfoForSelf]

Categories

(Core :: Gecko Profiler, defect, P1)

Unspecified
Windows
defect

Tracking

()

RESOLVED FIXED
90 Branch
Tracking Status
firefox-esr78 --- wontfix
firefox87 --- wontfix
firefox88 --- wontfix
firefox89 --- wontfix
firefox90 --- fixed

People

(Reporter: toshi, Assigned: toshi)

References

Details

(Keywords: crash)

Crash Data

Attachments

(4 files, 1 obsolete file)

Crash report: https://crash-stats.mozilla.org/report/index/87dd4fbb-2c1c-4687-836a-309020210330

Reason: EXCEPTION_ACCESS_VIOLATION_READ

Top frames of crashing thread:

0 xul.dll static SharedLibraryInfo::GetInfoForSelf tools/profiler/core/shared-libraries-win32.cc:209
1 xul.dll mozilla::Telemetry::BatchProcessedStackGenerator::BatchProcessedStackGenerator toolkit/components/telemetry/other/ProcessedStack.cpp:72
2 xul.dll mozilla::UntrustedModulesProcessor::CompleteProcessing toolkit/xre/UntrustedModulesProcessor.cpp:862
3 xul.dll mozilla::MozPromise<mozilla::Maybe<mozilla::UntrustedModulesProcessor::ModulesMapResultWithLoads>, nsresult, 1>::ThenValue<`lambda at /builds/worker/checkouts/gecko/toolkit/xre/UntrustedModulesProcessor.cpp:548:9', `lambda at /builds/worker/checkouts/gecko/toolkit/xre/UntrustedModulesProcessor.cpp:557:9'>::DoResolveOrRejectInternal xpcom/threads/MozPromise.h:846
4 xul.dll mozilla::MozPromise<RefPtr<mozilla::AllocPolicy::Token>, bool, 1>::ThenValueBase::ResolveOrRejectRunnable::Run xpcom/threads/MozPromise.h:487

The crash happened because the module was unloaded while running GetPdbInfo. It seems that calling LoadLibraryEx does not contribute to locking the module loaded.

0:010> .excr
rax=00007ffc35faa560 rbx=0000000000000000 rcx=00007ffc50e8d074
rdx=0000000000000000 rsi=00007ffc35e80000 rdi=000000f9a35be960
rip=00007ffc18f0fc68 rsp=000000f9a35be790 rbp=00007ffc35e80000
 r8=000000f9a35be748  r9=000000000000003f r10=0000000000000000
r11=0000000000000246 r12=ffffffffffffffff r13=0003001100000000
r14=000000f9a35bec40 r15=000000f9a35be870
iopl=0         nv up ei pl nz na po nc
cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b             efl=00010206
xul!GetPdbInfo+0x43 [inlined in xul!SharedLibraryInfo::GetInfoForSelf+0x398]:
00007ffc`18f0fc68 83780c02        cmp     dword ptr [rax+0Ch],2 ds:00007ffc`35faa56c=????????

Unloaded modules that overlapped the address in the past:
         Base Address       End Address              Size
        7ffc`35e80000     7ffc`35fcd000        0`0014d000 msvp9dec_store.dll

Marking as high severity because this is a topcrasher in RDD.

Severity: -- → S2
Priority: -- → P1

The crashing stack includes UntrustedModulesProcessor, but the code we need to fix is under the profiler.

Component: Launcher Process → Gecko Profiler
Product: Firefox → Core

Before SharedLibraryInfo::GetInfoForSelf calls GetPdbInfo to parse a module's
PE header, it calls LoadLibraryEx to prevent the module from being unloaded
during GetPdbInfo. If the module was already unloaded before LoadLibraryEx,
however, LoadLibraryEx maps the module onto an address different from the original
mapped address, so that module.lpBaseOfDll becomes invalid.

This patch is to call LoadLibraryEx before GetModuleInformation to make sure
LoadLibraryEx increments the module's refcount and does not map the module onto
a new address. With this, module.lpBaseOfDll is always valid, thus we don't have
to call VirtualQuery.

Pushed by tkikuchi@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6bd7baadb78c
Make sure LoadLibraryEx really locks the mapped address. r=mstange
Status: NEW → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → 89 Branch

The patch landed in nightly and beta is affected.
:toshi, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(tkikuchi)
Crash Signature: [@ SharedLibraryInfo::GetInfoForSelf] → [@ SharedLibraryInfo::GetInfoForSelf] [@ strlen | SharedLibraryInfo::GetInfoForSelf] [@ strlen | NS_ConvertUTF8toUTF16::NS_ConvertUTF8toUTF16 | SharedLibraryInfo::GetInfoForSelf]

Comment on attachment 9212831 [details]
Bug 1702086 - Make sure LoadLibraryEx really locks the mapped address. r=mstange

Beta/Release Uplift Approval Request

  • User impact if declined: This was a top crasher of the rdd process, and theoretically the same crash can happen in the main or tab process depending on the timing of a module's unload. Without the fix, a user hits a crash when an injected module was unloaded while parsing its PE header to collect a telemetry ping. If it is a graphics driver, a video will be stopped.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The root cause was we assumed LoadLibraryEx would load a module as data onto the same address as before, but it was not correct. The fix makes sure we increment the module's refcount before parsing, and if unexpected unload happens, we skip parsing. The risk is low as we know the root cause and we have a targeted fix.
  • String changes made/needed: None
Flags: needinfo?(tkikuchi)
Attachment #9212831 - Flags: approval-mozilla-beta?

Comment on attachment 9212831 [details]
Bug 1702086 - Make sure LoadLibraryEx really locks the mapped address. r=mstange

Fix is looking good on Nightly so far, so let's see how it goes on Beta. Thanks for tracking this down! Approved for 88.0b8.

Attachment #9212831 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

The same crash is still happening with the fix, though the volume is small. Let's wait until this lands on release and see the impact of the patch.

See Also: → 1704684

I think my patch mistakenly reopened bug 1607574. This area is sensitive and I need to rethink a fix including bug 1704684.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 89 Branch → ---
See Also: → 1607574

Comment on attachment 9212831 [details]
Bug 1702086 - Make sure LoadLibraryEx really locks the mapped address. r=mstange

Clearing the Beta approval to get it off the needs-uplift radar.

Attachment #9212831 - Flags: approval-mozilla-beta+ → approval-mozilla-beta-

The backout will ship in 88.0.1 going out next week.

See Also: 1704684
Duplicate of this bug: 1704684
Attachment #9212831 - Attachment is obsolete: true

This patch introduces EnumerateProcessModules to enumerate all loaded modules
in the local process so that Gecko profiler and baseprofiler can use it.

This patch is the actual fix for Bug 1702086. The problem of Bug 1702086 is that
LoadLibraryExW loaded the module onto an address different from the original
mapped addresss because it was unloaded after we started enumeration. Calling
GetPdbInfo with the original address module.lpBaseOfDll caused a crash.

The proposed fix consists of three parts.

The first part is to get PDB information from handleLock, which is always valid
even if the original address was unloaded. With this, we don't need a check
of VirtualQuery.

The second part is to add LOAD_LIBRARY_AS_IMAGE_RESOURCE along with
LOAD_LIBRARY_AS_DATAFILE to the call to LoadLibraryEx. This is needed
to read information from the sections outside the PE headers because
RVA (= relative virtual address) is an address after relocation.
Without LOAD_LIBRARY_AS_IMAGE_RESOURCE, a module is mapped without relocation,
so GetPdbInfo() accesses wrong memory resulting in a crash.

The third part is to introduce PEHeaders::GetPdbInfo, replacing two versions
of GetPdbInfo in Gecko profiler and baseprofiler.

Depends on D115252

This patch replaces two versions of GetVersion in Gecko profiler and
baseprofiler with PEHeaders::GetVersionInfo.

Depends on D115253

SharedLibraryInfo::GetInfoForSelf() can use PEHeaders::GetBounds instead of
GetModuleInformation to get the start/end address of a module's mapped region
in the local process. It's roughly 100x faster because GetModuleInformation
invokes two system calls NtQueryInformationProcess and NtReadVirtualMemory
while nt::PEHeaders does not.

Depends on D115254

See Also: → 1723868
Regressions: 1724770
You need to log in before you can comment on or make changes to this bug.