Closed Bug 1607574 Opened 5 years ago Closed 5 years ago

Crash in [@ nvd3d9wrapx.dll | trunc | nvd3d9wrapx.dll | FindModule]

Categories

(External Software Affecting Firefox :: Telemetry, defect, P2)

x86_64
Windows 7
defect

Tracking

(firefox-esr68 unaffected, firefox72 unaffected, firefox73+ fixed, firefox74+ fixed)

RESOLVED FIXED
Tracking Status
firefox-esr68 --- unaffected
firefox72 --- unaffected
firefox73 + fixed
firefox74 + fixed

People

(Reporter: philipp, Assigned: toshi)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-faf9a87b-f995-4749-8d35-b39350200107.

Top 10 frames of crashing thread:

0 nvd3d9wrapx.dll nvd3d9wrapx.dll@0x28bb 
1 xul.dll trunc 
2 nvd3d9wrapx.dll nvd3d9wrapx.dll@0x50d4 
3  @0x7fef5bdffff 
4 nvd3d9wrapx.dll nvd3d9wrapx.dll@0x4c37 
5 kernel32.dll FindModule 
6 kernel32.dll K32GetModuleInformation 
7 xul.dll static SharedLibraryInfo::GetInfoForSelf tools/profiler/core/shared-libraries-win32.cc:160
8 xul.dll mozilla::Telemetry::BatchProcessedStackGenerator::BatchProcessedStackGenerator toolkit/components/telemetry/other/ProcessedStack.cpp:73
9 xul.dll mozilla::UntrustedModulesProcessor::CompleteProcessing toolkit/xre/UntrustedModulesProcessor.cpp:821

these content crashes from 64bit windows 7 users are starting to ride the trains into firefox 73.0b (on nightly it has been present since 70.0a1).

Hey Toshi, this looks like it'll get pretty bad in release. Can you take look?

Flags: needinfo?(tkikuchi)
Priority: -- → P2

Interestingly this crash is the exact same as Bug 1376177, crashing at nvd3d9wrapx+0x28bb.

From a quick look at raw dumps, the crash happened when we called LoadLibraryEx here. Even though we loaded a module as data, somehow it invoked nvd3d9wrapx's code (LoadLibrary was hooked..?), and it touched the code of an unloaded module nvinitx.dll, resulting in the crash.

I think this is not a bug on our side, but let me see how we can bypass it..

Flags: needinfo?(tkikuchi)
Crash Signature: arena_t::MallocSmall] → arena_t::MallocSmall] [@ nvd3d9wrap.dll | SharedLibraryInfo::GetInfoForSelf]

Toshi, are you still able to investigate this crash? It is a high volume for nightly and beta.

Flags: needinfo?(tkikuchi)

I'm still not coming up with a good solution or workaround. Is this occurring in Nightly? I can see all crash reports are from Win7 + 73 beta.

One approach is to block the crashing module nvd3d9wrapx.dll, but I have some concern it may break graphics functionality.

The crash happens while collecting the stack trace as a part of the third-party-module ping in the content process. So it's not essential to graphics functionality and we may be able to add ad-hoc check to skip this particular case, but we want to avoid such a hacky approach as much as possible.

Flags: needinfo?(tkikuchi)

Do we have a contact to Nvidia to notify them? I think I got a good understanding on how this happened.

When nvd3d9wrapx.dll is loaded, its entrypoint detours KERNELBASE!LoadLibraryExW. What their hook function does is:

  1. whether the flag LOAD_LIBRARY_AS_DATAFILE is specified or not,
  2. if the target filename meets some condition,
  3. and if OS version is older than 6.2,
  4. they run a code checking a global variable of nvd3d9wrapx.dll.

In this crash case, we magically met all conditions: loading detoured.dll with LOAD_LIBRARY_AS_DATAFILE on Win7, and the global variable pointed to an address within an unmapped nvinitx.dll.

I think an ideal behavior of nvd3d9wrapx.dll is

  • If LOAD_LIBRARY_AS_DATAFILE is passed to LoadLibraryExW, they should just call to the original LoadLibraryExW because it will not invoke any code execution.
  • As long as nvd3d9wrapx.dll holds a pointer to something in nvinitx.dll, nvinitx.dll should not be freed.

Anything we can do about this on our end Toshi?

Flags: needinfo?(tkikuchi)

Apart from blocking Nvidia's modules, we can add an ugly hackaround to skip LoadLibraryEx when all the following conditions are met.

  • if we try to load a path ending with detoured.dll
  • if os version is older than 6.2
  • if nvd3d9wrapx.dll is loaded
  • if nvinitx.dll is not loaded

Our drawback is we lose detoured.dll's information from telemetry's callstack (Some frames will be shown as an unmapped address).

Gerald, we're having crash reports from profiler's code because of Nvidia's bug since we started to collect the third-paty-module ping from the tab process. I'm thinking of adding a hack in SharedLibraryInfo::GetInfoForSelf, but it's not elegant. Do you have any idea to workaround this?

Flags: needinfo?(tkikuchi) → needinfo?(gsquelart)

I don't know enough about the arcane Windows library stuff to be helpful here I'm afraid.

But if the problem happens in GetInfoForSelf and you know a way to avoid it there (I'm guessing you'd somehow detect this bad library, and then avoid the crashy call?), I don't mind it, especially if we can quickly prevent lots of crashes. NI back to you.

You/I/someone can always explore more elegant solutions later on... (We should meet in brrrlin!)

Flags: needinfo?(gsquelart) → needinfo?(tkikuchi)

FYI, the volume of this crash currently puts it at around the #5 overall top crash for Beta73, so IMO we definitely need to come up with a workaround on our end for this.

Ok, let's implement a workaround then. It's almost ready. I'll start a review by EOD today.

I wrote up the details about this crash and how I found the conditions.
https://docs.google.com/document/d/1hhS53OdN2GU3WajjRpoaGblJJvFr9SWIrvVn_qlklw8/edit?usp=sharing

Assignee: nobody → tkikuchi
Flags: needinfo?(tkikuchi)

We have a number of reports crashing in nvd3d9wrapx.dll when we call LoadLibraryEx with
LOAD_LIBRARY_AS_DATAFILE to collect PDB information. It seems that nvd3d9wrapx.dll
detours LoadLibraryEx and its detour function has a bug to access an address of nvinitx.dll
even after it's unloaded.

After we landed Bug 1522830, we started to collect the stack information in the content
process, which introduced a chance to meet conditions to hit this crash.

Given that the high volume of crash reports, this patch adds an ad-hoc workaround to add
library info with dummy PDB info instead of calling LoadLibraryEx.

Pushed by ncsoregi@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b4af0279ec1a
Skip LoadLibraryEx if we're likely to hit the crash in nvd3d9wrapx.dll.  r=gerald
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
See Also: → 1376177

Comment on attachment 9121439 [details]
Bug 1607574 - Skip LoadLibraryEx if we're likely to hit the crash in nvd3d9wrapx.dll. r=gerald

Beta/Release Uplift Approval Request

  • User impact if declined: This is one of the top crashes for Beta 73. We don't have a repro, but it seems Firefox crashes when Nvidia's shim driver was injected into the tab process. This is not a regression, but we started to get this crash reports from 73 because Bug 1522830 started to collect the ping from the tab process.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch is to detect a limited situation to be likely to hit the crash (i.e. trying to load detoured.dll, nvd3d9wrapx.dll is loaded, nvinitx.dll is not loaded, and OS is Win7), and to skip the crashing function call which is telemetry-related.
  • String changes made/needed: None
Attachment #9121439 - Flags: approval-mozilla-beta?

Comment on attachment 9121439 [details]
Bug 1607574 - Skip LoadLibraryEx if we're likely to hit the crash in nvd3d9wrapx.dll. r=gerald

Hopefully work around buggy nVidia drivers to avoid a topcrash on 73. Approved for 73.0b7.

Attachment #9121439 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
See Also: → 1617188
See Also: → 1635823
See Also: → 1635824
See Also: 1635824
See Also: → 1643359

This reappeared in April probably because of bug 1702086. The regressing patch was already backed out.

See Also: → 1702086
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: