Closed Bug 1233237 Opened 8 years ago Closed 8 years ago

crash with Nvidia's Network Access Manager (nvappfilter.dll/nvlsp.dll) in Firefox 43

Categories

(Core :: General, defect)

43 Branch
x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
firefox43 + wontfix
firefox44 + fixed
firefox45 --- fixed
firefox46 --- fixed
relnote-firefox --- 43+

People

(Reporter: philipp, Assigned: dragana)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

[Tracking Requested - why for this release]:

This bug was filed from the Socorro interface and is 
report bp-cdb9688e-7cda-4ada-8f1a-7ed192151215.
=============================================================
Crashing Thread (7)
Frame 	Module 	Signature 	Source
Ø 0 	nvappfilter.dll 	nvappfilter.dll@0xa387 	
Ø 1 	nvappfilter.dll 	nvappfilter.dll@0x618d 	
2 	nss3.dll 	_MD_CloseSocket 	nsprpub/pr/src/md/windows/w95sock.c
3 	nss3.dll 	SocketClose 	nsprpub/pr/src/io/prsocket.c
4 	xul.dll 	mozilla::net::ClosingService::ThreadFunc() 	netwerk/base/ClosingService.cpp
5 	nss3.dll 	_PR_NativeRunThread 	nsprpub/pr/src/threads/combined/pruthr.c
6 	nss3.dll 	pr_root 	nsprpub/pr/src/md/windows/w95thred.c
7 	msvcr120.dll 	_callthreadstartex 	f:\dd\vctools\crt\crtw32\startup\threadex.c:376
8 	msvcr120.dll 	msvcr120.dll@0x2c000 	
9 	kernel32.dll 	BaseThreadStart

after firefox 43 was released, there seems to be a spike in signatures containing dll files related to Nvidia's Network Access Manager.
at the moment the combination of these signatures make up around 3% of crash data in early incoming firefox 43 and predominantly windows xp, 7 and vista are affected.
Attached patch bug1233237.patchSplinter Review
presumptive blocklisting patch for the involved .dlls

researching the issue, it appears that nvidia has released this type of software a great while ago already (and that's why predominantly older windows versions are affected) and may no longer maintain it. 
the overall correlation data also suggests, that those dlls are not used very widely, so the damage of blocklisting them might be minimal compared to the benefits...
Assignee: nobody → madperson
Attachment #8700318 - Flags: review?(aklotz)
I remember this old application filter from Nvidia, it was in nForce boards, and causing P2P clients like µT crash and internet connection issues.
https://www.google.com/search?q=utorrent+nvidia+application+filter
In bug 1233934, there is a French user affected by this startup crash on Win XP, as previous versions work, I guess it could be a regression in FF too.
signatures related to this crash are currently taking #1,#2 & #4 on the top crash score board for 43.0.1 due to being a recurring startup crash:
https://crash-analysis.mozilla.com/rkaiser/crash-report-tools/score/?version=43.0.1&limit=30
I think we should take this on release for 43.0.2 (contingent on review).
Attachment #8700318 - Flags: review?(aklotz) → review+
Comment on attachment 8700318 [details] [diff] [review]
bug1233237.patch

Approval Request Comment
[Feature/regressing bug #]: Windows DLL Blocklist
[User impact if declined]: Crashes with Nvidia Network Access Manager
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: Low, DLL blocklist change
[String/UUID change made/needed]: None
Attachment #8700318 - Flags: approval-mozilla-release?
Attachment #8700318 - Flags: approval-mozilla-beta?
Attachment #8700318 - Flags: approval-mozilla-aurora?
Comment on attachment 8700318 [details] [diff] [review]
bug1233237.patch

Approved for uplift to m-r. We want this in the 43.0.2 build I'm about to start today.
Attachment #8700318 - Flags: approval-mozilla-release? → approval-mozilla-release+
Comment on attachment 8700318 [details] [diff] [review]
bug1233237.patch

Dll blocklist to prevent crash. Beta44+, Aurora45+
Attachment #8700318 - Flags: approval-mozilla-beta?
Attachment #8700318 - Flags: approval-mozilla-beta+
Attachment #8700318 - Flags: approval-mozilla-aurora?
Attachment #8700318 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/mozilla-central/rev/a882a8c92a28
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
feedback from affected users tells that the blocklist entry does indeed solve the crashes, but firefox is unable to load any pages afterwards. not sure if this is a desired outcome - for diagnosing the issue for the purposes of support it may be easier if the browser just crashes (as the crash id will give clues about this problem).

any thoughts on this - should the blocklist entry be reverted?
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Aaron, what do you think about that?
Flags: needinfo?(aklotz)
*sigh* For the short term, I think we'll have to revert the blocklist. The problem is probably that some of these DLLs are Winsock LSPs. Preventing their loading is insufficient -- we also need to prevent winsock2 from seeing the DLL as an integral part of the protocol stack.

I have a few ideas for this, but in the meantime, let's revert.
Flags: needinfo?(aklotz)
philipp, what percentage of these crashes belong to Windows XP users?
Flags: needinfo?(madperson)
from the query in comment #1, 45% of the crashes were from xp users.

(i'm also unassigned from the bug since anything different from a simple blocklisting patch is beyond my capabilities)
Assignee: madperson → nobody
Flags: needinfo?(madperson)
Thanks philipp! I'm sorry that your patch wasn't able to fix this 100%!
Jason, do you or your team have any idea why we might be suddenly seeing this crash spike in 43? For some reason this nvidia LSP doesn't like us.
Flags: needinfo?(jduell.mcbugs)
Also ni? mcmanus
Flags: needinfo?(mcmanus)
(In reply to Aaron Klotz [:aklotz] (please use needinfo) from comment #24)
> Also ni? mcmanus

Sorry, I should clarify why I ni'd you: We're wondering if you have any ideas as to why this would suddenly have spiked in 43.
Sylvestre, can we have your approval for backing the blocklist changes out?
Flags: needinfo?(sledru)
no idea.. probly need qa to try and repro
Flags: needinfo?(mcmanus)
Please go ahead asap, please land the backout asap as I will start the build soon.

Next time, please request the backout in a different bug. It makes the life of everybody easier.
Flags: needinfo?(sledru)
https://hg.mozilla.org/releases/mozilla-release/rev/362cb79ce600

Unsure if this needs backed out of anything else, but those are less pressing than release.
Added to the 43.0.3 release notes with "Fix network issue when using Nvidia's Network Access Manager (1233237)" as wording.

By the way, I hope this bug doesn't prevent the updater to work...
https://hg.mozilla.org/mozilla-central/rev/a19eb63ac763
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
guijoselito, could you please help me understand why was this bug reopened? I was under the impression that we took a fix for this a few days back in all the relevant branches (Beta/Aurora/Release).
Flags: needinfo?(guijoselito)
This bug is about the crash spike.
A fix for this landed on comments 12, 13 and 14.
On comment 17 it's said that the fix is causing connections problems.
Comments 29, 30, 31 and 33 are the backouts from the patches landed earlier.

So, now, there is no connections problems, but the crashes are still happening.
Flags: needinfo?(guijoselito)
I'm on one of those NVidia nForce PC. the crash mentioned are fatal for my system. if Firefox crashes, it crashes the whole system by freezing it. I have to power cycle or reset my system and firefox will show that it's unable to restore previous sessions. my home page is set to restore tabs in previous session.

43.0.2 was unable to load any page until I did the following to reset network in elevated prompt on Win 7. firefox's tech support page mentioning DNS, IPv6 and a bunch other things didn't help. 
Ironically, I had to use Chrome for 2 days to look for help. I thought I could use Firefox 42 32-bit while test/try things on Firefox 43.0.2 64-bit, but that didn't work

netsh winsock reset catalog
ipconfig /flushdns
ipconfig /registerdns
(In reply to Guilherme Lima from comment #38)
> This bug is about the crash spike.
> A fix for this landed on comments 12, 13 and 14.
> On comment 17 it's said that the fix is causing connections problems.
> Comments 29, 30, 31 and 33 are the backouts from the patches landed earlier.
> 
> So, now, there is no connections problems, but the crashes are still
> happening.

Got it. Reversting the status flags in that case.
It seems this was backed out before the 43.0.3 release, but not removed from the release notes: https://www.mozilla.org/en-US/firefox/43.0.3/releasenotes/
Flags: needinfo?(sledru)
The release notes are correct.
Just that we should have create a new bug.

1) we had a crash
2) we decided to blacklist the ddl
3) we released 43.0 with these dll blacklisted
4) it caused some unexpected changes in behavior (no network). Comment #17
5) we reverted the blacklist and published 43.0.3 with this change
Flags: needinfo?(sledru)
I'm on one of those NVidia nForce PC. The crash is not fatal for my system (windows XP 32bits).
On Firefox 43.0.3.5835 connection work but Firefox crash a few seconds after starting.
On 43.0.2 (or 43.0.1, I'm not sure) connection not work but Firefox not crash.
I'm using an other lan card (without nforce chipset) and it is the same.

For info, I don't have nvlsp.dll

{ "nvappfilter.dll", MAKE_VERSION(2, 2, 0, 6531) } mean just the version 2.2.0 of nvappfilter.dll is concerne by the patch ?

Thanks a lot
hey davidfr, this blocklisting entry meant that version 2.2.0.6531 of nvappfilter.dll and below were blocked (which has been reverted in 43.0.3 again due to causing different issues).

if you can reproduce the crashes reliably on startup, it would be very helpful if you could run the mozregression tool, which could narrow down the timeframe in which firefox builds this crash first occurred (that might give developers a clearer picture and maybe an possibility to fix it as well): https://mozilla.github.io/mozregression/

in case you just want to quickly fix the issue, uninstalling nvidia's firewall was generally reported to solve the crashes: https://support.mozilla.org/en-US/kb/connectivity-crashing-firefox-43-update
Flags: needinfo?(davidfr)
Hey philipp, thanks for you message. (I have a version < 2.2.0.6531 of nvappfilter.dll)
I will see for the mozregression tool

I don't want to unistall Nvidia's firewall, I like the concept of hardware firewall, and honestly I'm using it for 10 years. I have use it this a lot of software including µT and other P2P software without any issus. (I have put in in advanced mod)

My solution for the moment is to keep Firefox 42 : https://ftp.mozilla.org/pub/firefox/releases/42.0/win32/
Flags: needinfo?(davidfr)
Last working 42.0
First crashing 43.0b1 - Internet work but crash after few seconds

So with mozregression tool, when Firefox crash I have : WARNING : Process exited with code 1

The result are :
2015-09-14--mozilla-central--firefox-43.0a1.en-US.win32 : work
2015-09-15--mozilla-central--firefox-43.0a1.en-US.win32.zip : crash

fba4b0cd3823--mozilla-central--firefox-43.0a1.en-US.win32.zip : work
3eb37530c627--mozilla-central--firefox-43.0a1.en-US.win32.zip : crash
ce211c03a51c--mozilla-central--firefox-43.0a1.en-US.win32.zip : work
a20b47a3d747--mozilla-inbound--firefox-43.0a1.en-US.win32.zip : crash
862295d7d82a--mozilla-inbound--firefox-43.0a1.en-US.win32.zip : crash
827a12863a6c--mozilla-inbound--firefox-43.0a1.en-US.win32.zip : crash
fb1f0c400ec0--mozilla-inbound--firefox-43.0a1.en-US.win32.zip : work
c93cd72f8d40--mozilla-inbound--firefox-43.0a1.en-US.win32.zip : work
9a5fd09d1589--mozilla-inbound--firefox-43.0a1.en-US.win32.zip : crash

The end of the log file is : 
INFO : Narrowed inbound regression window from [c93cd72f, 827a1286] (3 revisions) to [c93cd72f, 9a5fd09d] (2 revisions) (~1 steps left)
DEBUG : Starting merge handling...
DEBUG : Using url: https://hg.mozilla.org/integration/mozilla-inbound/json-pushes?changeset=9a5fd09d1589e8d37c1db651896f5a2e480d9e5d&full=1
DEBUG : Found commit message:
Bug 1176941 - Moving console warning to the current window, r=jib

How can I send you the log file ?
so the most interesting thing in that range (from comment 47) is

Bug 1152046 - Make separate thread only for PRClose

I have a untested theory.. the LSP isn't threadsafe and it hooks both socket() (or perhaps connect()) and then later close() and something bad happens inside the LSP because now close is being handled on a different thread.

Dragana, Honza - thoughts on that idea?

Dragana, do you think we can give DavidFR a test build that just does synchronous closes (backing out that patch would be hard I know)?
Flags: needinfo?(honzab.moz)
Flags: needinfo?(dd.mozilla)
Thanks DavidFR!
the pushlog of the regression-range provided by david in comment #47 is https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c93cd72f&tochange=9a5fd09d for easy access btw
Target Milestone: mozilla46 → ---
If we can give out a try build with reserialized close calls and get back results, then let's do it.  I think it should be fairly easy to disable it.
Flags: needinfo?(honzab.moz)
I will disable doing close on different thread.

The build is coming soon.
Hi Dragana,

I have try it, it works (don't crash and Internet work) :-) good job !!!!


But it have other problems (I'm think it's not caused by nvidia) :
- The recovery tabs don't work (it open the gool number of tab, but blank and without url), with a profil use on firefox 42.
- When I'm using the proposition on the adresse bar, website don't load. But I enter a new url it work well.

Whith a profil made by this nithly it is the same : 
- For the existing url (like https://www.mozilla.org/en-US/contribute/). I begin to enter mozilla in te adress bar, adress bar propose https://www.mozilla.org/en-US/contribute/ and if i clic on it, the website is never open ! And the url go off, the tab title is " Connecting..." with the circle.
- The recovery only recovery new tabs

It's just for information, I don't think it is caused by nvidia.

Thanks for you work and reactivity :-)
Flags: needinfo?(davidfr)
(In reply to DavidFR from comment #55)
> Hi Dragana,
> 
> I have try it, it works (don't crash and Internet work) :-) good job !!!!
> 

Thanks for testing it.

> 
> But it have other problems (I'm think it's not caused by nvidia) :
> - The recovery tabs don't work (it open the gool number of tab, but blank
> and without url), with a profil use on firefox 42.
> - When I'm using the proposition on the adresse bar, website don't load. But
> I enter a new url it work well.
> 
> Whith a profil made by this nithly it is the same : 
> - For the existing url (like https://www.mozilla.org/en-US/contribute/). I
> begin to enter mozilla in te adress bar, adress bar propose
> https://www.mozilla.org/en-US/contribute/ and if i clic on it, the website
> is never open ! And the url go off, the tab title is " Connecting..." with
> the circle.
> - The recovery only recovery new tabs
> 
> It's just for information, I don't think it is caused by nvidia.
> 

This is another problem, but probably not a problem, because the build I have sent you is not a release, it is even not an official nightly build 

> Thanks for you work and reactivity :-)
Should we backed out  this change for all users or only for with certain dlls (if it is possible)?

I could leave ClosingService for UDP sockets similar as it used to be before bug 1152046?
Flags: needinfo?(mcmanus)
Blocks: 1152046
(In reply to Patrick McManus [:mcmanus] from comment #48)
> so the most interesting thing in that range (from comment 47) is
> 
> Bug 1152046 - Make separate thread only for PRClose

I should have posted in this bug report earlier, the reporter of the dupe in bug 1233934 provided the regression pushlog since 2015-12-27:
https://bugzilla.mozilla.org/show_bug.cgi?id=1233934#c21
Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c93cd72f8d4057f536fd87cda8088828bd1bd5ef&tochange=827a12863a6cde252e0fc47199562b78c0eed55e
I'm also working on a special LSP blocklist but that's not something that we'd want to uplift.
as discussed in the necko meeting today - we're concerned in general about the threadsafeness of lsp's and we're going to revert back all of the close behavior to be synchronous on the socket thread.

That does bring back some intermittent error reports where close() blocked for a bizarrely long time, especially during shutdown that will need to be further battled - but this data on nvidia indicates something pretty fundamental that might not be limited to just that LSP.

dragana and bagder are going to do the change work to make this synchronous again. I think its a candidate for 45 and 44 myself.
Flags: needinfo?(mcmanus)
Assignee: nobody → dd.mozilla
Status: REOPENED → ASSIGNED
Attachment #8705588 - Flags: review?(mcmanus)
Comment on attachment 8705588 [details] [diff] [review]
bug_1233237.patch

Review of attachment 8705588 [details] [diff] [review]:
-----------------------------------------------------------------

* this absolutely needs to go in its own bug, too confusing to lump it in here with all the changes to the blocklist. you can leave this bug open depending on the new bug until it is merged. There is no good answer to that mess.

* I'm going to r+ this for mozilla-central, but since we want to backport this asap I hope there is a simpler patch for the port.

* would ClosingService::AttachIOLayer() doing an immediate return have the same impact? It looks that way to me.

::: netwerk/base/nsIncrementalDownload.cpp
@@ +5,5 @@
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>  
>  #include "mozilla/Attributes.h"
>  #include "mozilla/UniquePtrExtensions.h"
> +#include "mozilla/UniquePtr.h"

heh.. I was going to say make this its own bug.. but if we're going to backport its probly better to keep it all together
Attachment #8705588 - Flags: review?(mcmanus) → review+
Depends on: 1238010
Depends on: 1238017
I'm closing this bug since both bug 1238010 and bug 1238017 have landed. The fix for this bug is in these 2 bugs.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Crash Signature: nvlsp.dll@0xe0a7] [@ nvlsp.dll@0xbe17] [@ nvlsp64.dll@0x4f607] [@ nvlsp64.dll@0x33307] [@ nvlsp.dll@0xc227] [@ nvlsp64.dll@0x4f677] → nvlsp.dll@0xe0a7] [@ nvlsp.dll@0xbe17] [@ nvlsp64.dll@0x4f607] [@ nvlsp64.dll@0x33307] [@ nvlsp.dll@0xc227] [@ nvlsp64.dll@0x4f677] [@ nvappfilter.dll@0xa5d7 ] [@ nvappfilter.dll@0xb687 ]
You need to log in before you can comment on or make changes to this bug.