Crash in [@ nvd3d9wrapx.dll | trunc | nvd3d9wrapx.dll | FindModule]
Categories
(External Software Affecting Firefox :: Telemetry, defect, P2)
Tracking
(firefox-esr68 unaffected, firefox72 unaffected, firefox73+ fixed, firefox74+ fixed)
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox72 | --- | unaffected |
firefox73 | + | fixed |
firefox74 | + | fixed |
People
(Reporter: philipp, Assigned: toshi)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
|
Details | Review |
This bug is for crash report bp-faf9a87b-f995-4749-8d35-b39350200107.
Top 10 frames of crashing thread:
0 nvd3d9wrapx.dll nvd3d9wrapx.dll@0x28bb
1 xul.dll trunc
2 nvd3d9wrapx.dll nvd3d9wrapx.dll@0x50d4
3 @0x7fef5bdffff
4 nvd3d9wrapx.dll nvd3d9wrapx.dll@0x4c37
5 kernel32.dll FindModule
6 kernel32.dll K32GetModuleInformation
7 xul.dll static SharedLibraryInfo::GetInfoForSelf tools/profiler/core/shared-libraries-win32.cc:160
8 xul.dll mozilla::Telemetry::BatchProcessedStackGenerator::BatchProcessedStackGenerator toolkit/components/telemetry/other/ProcessedStack.cpp:73
9 xul.dll mozilla::UntrustedModulesProcessor::CompleteProcessing toolkit/xre/UntrustedModulesProcessor.cpp:821
these content crashes from 64bit windows 7 users are starting to ride the trains into firefox 73.0b (on nightly it has been present since 70.0a1).
![]() |
||
Comment 1•5 years ago
|
||
Hey Toshi, this looks like it'll get pretty bad in release. Can you take look?
Assignee | ||
Comment 2•5 years ago
|
||
Interestingly this crash is the exact same as Bug 1376177, crashing at nvd3d9wrapx+0x28bb.
From a quick look at raw dumps, the crash happened when we called LoadLibraryEx
here. Even though we loaded a module as data, somehow it invoked nvd3d9wrapx's code (LoadLibrary
was hooked..?), and it touched the code of an unloaded module nvinitx.dll, resulting in the crash.
I think this is not a bug on our side, but let me see how we can bypass it..
Reporter | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 3•5 years ago
|
||
Toshi, are you still able to investigate this crash? It is a high volume for nightly and beta.
Assignee | ||
Comment 4•5 years ago
|
||
I'm still not coming up with a good solution or workaround. Is this occurring in Nightly? I can see all crash reports are from Win7 + 73 beta.
One approach is to block the crashing module nvd3d9wrapx.dll, but I have some concern it may break graphics functionality.
The crash happens while collecting the stack trace as a part of the third-party-module ping in the content process. So it's not essential to graphics functionality and we may be able to add ad-hoc check to skip this particular case, but we want to avoid such a hacky approach as much as possible.
Assignee | ||
Comment 5•5 years ago
|
||
Do we have a contact to Nvidia to notify them? I think I got a good understanding on how this happened.
When nvd3d9wrapx.dll is loaded, its entrypoint detours KERNELBASE!LoadLibraryExW
. What their hook function does is:
- whether the flag
LOAD_LIBRARY_AS_DATAFILE
is specified or not, - if the target filename meets some condition,
- and if OS version is older than 6.2,
- they run a code checking a global variable of nvd3d9wrapx.dll.
In this crash case, we magically met all conditions: loading detoured.dll with LOAD_LIBRARY_AS_DATAFILE
on Win7, and the global variable pointed to an address within an unmapped nvinitx.dll.
I think an ideal behavior of nvd3d9wrapx.dll is
- If
LOAD_LIBRARY_AS_DATAFILE
is passed toLoadLibraryExW
, they should just call to the originalLoadLibraryExW
because it will not invoke any code execution. - As long as nvd3d9wrapx.dll holds a pointer to something in nvinitx.dll, nvinitx.dll should not be freed.
Assignee | ||
Comment 7•5 years ago
|
||
Apart from blocking Nvidia's modules, we can add an ugly hackaround to skip LoadLibraryEx
when all the following conditions are met.
- if we try to load a path ending with detoured.dll
- if os version is older than 6.2
- if nvd3d9wrapx.dll is loaded
- if nvinitx.dll is not loaded
Our drawback is we lose detoured.dll's information from telemetry's callstack (Some frames will be shown as an unmapped address).
Gerald, we're having crash reports from profiler's code because of Nvidia's bug since we started to collect the third-paty-module ping from the tab process. I'm thinking of adding a hack in SharedLibraryInfo::GetInfoForSelf
, but it's not elegant. Do you have any idea to workaround this?
I don't know enough about the arcane Windows library stuff to be helpful here I'm afraid.
But if the problem happens in GetInfoForSelf
and you know a way to avoid it there (I'm guessing you'd somehow detect this bad library, and then avoid the crashy call?), I don't mind it, especially if we can quickly prevent lots of crashes. NI back to you.
You/I/someone can always explore more elegant solutions later on... (We should meet in brrrlin!)
Comment 9•5 years ago
|
||
FYI, the volume of this crash currently puts it at around the #5 overall top crash for Beta73, so IMO we definitely need to come up with a workaround on our end for this.
Assignee | ||
Comment 10•5 years ago
|
||
Ok, let's implement a workaround then. It's almost ready. I'll start a review by EOD today.
I wrote up the details about this crash and how I found the conditions.
https://docs.google.com/document/d/1hhS53OdN2GU3WajjRpoaGblJJvFr9SWIrvVn_qlklw8/edit?usp=sharing
Assignee | ||
Comment 11•5 years ago
|
||
We have a number of reports crashing in nvd3d9wrapx.dll when we call LoadLibraryEx
with
LOAD_LIBRARY_AS_DATAFILE
to collect PDB information. It seems that nvd3d9wrapx.dll
detours LoadLibraryEx and its detour function has a bug to access an address of nvinitx.dll
even after it's unloaded.
After we landed Bug 1522830, we started to collect the stack information in the content
process, which introduced a chance to meet conditions to hit this crash.
Given that the high volume of crash reports, this patch adds an ad-hoc workaround to add
library info with dummy PDB info instead of calling LoadLibraryEx
.
Comment 12•5 years ago
|
||
Pushed by ncsoregi@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b4af0279ec1a Skip LoadLibraryEx if we're likely to hit the crash in nvd3d9wrapx.dll. r=gerald
Comment 13•5 years ago
|
||
bugherder |
Assignee | ||
Comment 14•5 years ago
|
||
Comment on attachment 9121439 [details]
Bug 1607574 - Skip LoadLibraryEx if we're likely to hit the crash in nvd3d9wrapx.dll. r=gerald
Beta/Release Uplift Approval Request
- User impact if declined: This is one of the top crashes for Beta 73. We don't have a repro, but it seems Firefox crashes when Nvidia's shim driver was injected into the tab process. This is not a regression, but we started to get this crash reports from 73 because Bug 1522830 started to collect the ping from the tab process.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): The patch is to detect a limited situation to be likely to hit the crash (i.e. trying to load detoured.dll, nvd3d9wrapx.dll is loaded, nvinitx.dll is not loaded, and OS is Win7), and to skip the crashing function call which is telemetry-related.
- String changes made/needed: None
Comment 15•5 years ago
|
||
Comment on attachment 9121439 [details]
Bug 1607574 - Skip LoadLibraryEx if we're likely to hit the crash in nvd3d9wrapx.dll. r=gerald
Hopefully work around buggy nVidia drivers to avoid a topcrash on 73. Approved for 73.0b7.
Comment 16•5 years ago
|
||
bugherder uplift |
Assignee | ||
Comment 17•3 years ago
|
||
This reappeared in April probably because of bug 1702086. The regressing patch was already backed out.
Description
•