Closed Bug 1242802 Opened 8 years ago Closed 8 years ago

NSS crash in _PR_CleanupThread on shutdown

Categories

(Firefox Build System :: General, defect)

46 Branch
x86_64
Windows 10
defect
Not set
critical

Tracking

(firefox45 unaffected, firefox46 unaffected, firefox47+ verified)

VERIFIED FIXED
Tracking Status
firefox45 --- unaffected
firefox46 --- unaffected
firefox47 + verified

People

(Reporter: streetwolf52, Assigned: ted)

References

Details

(4 keywords)

Crash Data

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:46.0) Gecko/20100101 Firefox/46.0
Build ID: 20160125073110

Steps to reproduce:

Go to some web sites then exit Fx46


Actual results:

Fx46 crashed


Expected results:

Fx46 should have crashed.
Unfortunately my crash dumps have no debugging info as I'm running from the inbounds.  I do have a regression range:

Bad -  https://hg.mozilla.org/integration/mozilla-inbound/rev/1b7625c90538a95413f7ca1910f4ec791eff82b5

Good - https://hg.mozilla.org/integration/mozilla-inbound/rev/5edf01b87f78b580d01dcd2aa1756d46dcede672

**** ER should be "shouldn't have crashed."
OS: Unspecified → Windows 10
Hardware: Unspecified → x86_64
Keywords: crash
Product: Firefox → Core
Forgot to mention that the crash happens on a new profile.
Summary: Fx46 using inbound crashes when I exit. → Fx46 from inbound crashes when I exit.
Reproduced the crash on Windows7 (m-i tinderbox build) as well:
https://hg.mozilla.org/integration/mozilla-inbound/rev/99bdd3287bcf9ecf974c6f68ba3ba15e6fc17937
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:46.0) Gecko/20100101 Firefox/46.0 ID:20160125083827


Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=fd1b82f7fbeeb92f6dddcccdb378126973b06a38&tochange=99bdd3287bcf9ecf974c6f68ba3ba15e6fc17937
Severity: normal → critical
Flags: needinfo?(ted)
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(gps)
Keywords: reproducible
Wait. This landed before the merge to aurora? huh, I'd rather have had that bake on nightly for a while.
Flags: needinfo?(mh+mozilla)
It didn't, I was confused by the version number.
And we're lacking symbols on nss3.dll :-/

This is definitely something for ted.
Flags: needinfo?(gps)
(In reply to Mike Hommey [:glandium] from comment #7)
> And we're lacking symbols on nss3.dll :-/
> 
> This is definitely something for ted.

These are inbound builds, we don't upload symbols for them.
(In reply to Gary [:streetwolf] from comment #2)
> Here's my crash dump:
> https://crash-stats.mozilla.com/report/index/2fe4c54f-0035-4635-acaf-
> fd7952160126
> 
> Here's the build where the problem started:
> http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-
> win64-pgo/1453746694/

I can't reproduce a crash on shutdown running this build on my local Win10 machine. I'm using a fresh testing profile. Are you doing anything specific that causes a crash?
Flags: needinfo?(ted)
Attached file stack
Here's what the stack for this crash looks like:
https://crash-stats.mozilla.com/report/index/2fe4c54f-0035-4635-acaf-fd7952160126
It's crashing here:
https://dxr.mozilla.org/mozilla-central/rev/aa90f482e16db77cdb7dea84564ea1cbd8f7f6b3/nsprpub/pr/src/threads/prtpd.c#237

trying to call a thread private destructor, it looks like there's a bad entry in that list?
Thanks so much for finding the regression window on this! If one of you could narrow down some reliable STR that would be super helpful.
I seem to crash very often by going to this site:  http://www.softexia.com/ and then exiting Fx. You might have to click on a few links to get it to crash. If you still can't crash try creating some bookmarks and go to the site by clicking on them.  I mostly use the bookmarks toolbar but have crashed using other methods. 

I also crash in safe-mode.
Keep in mind Comment 4.  Crash was under Windows 7 x86.
Might be Windows x64
Ted... I'll test a try build if you come up with one.
Thanks! I was able to reproduce it by loading that site.
Keywords: steps-wanted
I reproduced it in a local debug build. Still looking.
If it helps this site doesn't produce a crash on exiting: http://www.onlinetextmessage.com/verizon-text-message.php
(In reply to Gary [:streetwolf] from comment #19)
> If it helps this site doesn't produce a crash on exiting:
> http://www.onlinetextmessage.com/verizon-text-message.php

It appears that when I use my bookmark to get to this site on my bookmarks toolbar I don't crash.  However when I click on the link I gave you above it does.
I have enough info to reproduce the crash in a debugger now, I just need to figure out what's going on. Thanks again for all your help!
Just thought things out and the reason I crash on the link above is that I had to get to this site first before I clicked on the link above.
Okay, I don't 100% know what's happening, but it's something like:
* Some NSS code from nssckbi.dll calls PR_NewThreadPrivateIndex, registers a destructor
* During shutdown we unload nssckbi.dll
* We shut down a thread which winds up calling the destructor from a DLL we already unloaded
This is the destructor registration:
https://dxr.mozilla.org/mozilla-central/rev/aa90f482e16db77cdb7dea84564ea1cbd8f7f6b3/security/nss/lib/base/error.c#67

I'm guessing the bug here is either that we're unloading NSS earlier than we should, or that PR_Free is getting inlined in error.c, so it's trying to call a destructor from a library we already unloaded.
bp-03a7f24d-2cb4-4ae2-8cfe-8b31b2160126
Crash Signature: [@ _PR_CleanupThread | _PR_NativeRunThread | pr_root]
Same crash report as Alice

https://crash-stats.mozilla.com/report/index/b83af0f0-0dba-4195-9b48-b14372160126 

Nightly win32 m-c build  Crashing on close of browser.

Setting to NEW
Status: UNCONFIRMED → NEW
Ever confirmed: true
Nightly 47 is also crashing for me with a slightly different signature (it happens when closing Firefox using the Close button).

[@ PR_DestroyThreadPrivate | PR_CleanupThread | PR_NativeRunThread | pr_root ]

Here are some more reports:
https://crash-stats.mozilla.com/report/index/31f167bf-bcc7-4b4e-a10e-c00082160126
https://crash-stats.mozilla.com/report/index/7222143b-1998-4752-b0d3-fb0c12160126
I'm 99% confident this is fallout from bug 1237863. That patch changed the linkage on some NSPR functions, and so nssckbi.dll winds up inlining PR_Free, but we unload that DLL during shutdown before we terminate all our threads and so the destructor it registers gets called after it has been unloaded.

In local testing backing this patch out seems to fix the crash, I'll push to try just for sanity and then likely land the backout soon.

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6380ab360cfa
Blocks: 1237863
Assignee: nobody → ted
Summary: Fx46 from inbound crashes when I exit. → NSS crash in _PR_CleanupThread on shutdown
The crash is somehow related to the "cache2". Removing permissions to write into "cache2/entries" folder or creating that folder "fixes" the issue.

Win7x64, Fx47x64 Nightly.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #29)
> I'm 99% confident this is fallout from bug 1237863. That patch changed the
> linkage on some NSPR functions, and so nssckbi.dll winds up inlining
> PR_Free, but we unload that DLL during shutdown before we terminate all our
> threads and so the destructor it registers gets called after it has been
> unloaded.
> 
> In local testing backing this patch out seems to fix the crash, I'll push to
> try just for sanity and then likely land the backout soon.
> 
> Try push:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=6380ab360cfa



Just tested this and the shut-down crash is gone.  win7 x64
(In reply to Aris from comment #30)
> The crash is somehow related to the "cache2". Removing permissions to write
> into "cache2/entries" folder or creating that folder "fixes" the issue.
> 
> Win7x64, Fx47x64 Nightly.

While deleting cache2 works the first time I exit, any subsequent exits produce the crash.
I also confirm that https://treeherder.mozilla.org/#/jobs?repo=try&revision=6380ab360cfa fixes the problem.
Thanks for testing, I'll back that patch out ASAP.
Crash Signature: [@ _PR_CleanupThread | _PR_NativeRunThread | pr_root] → [@ _PR_CleanupThread | _PR_NativeRunThread | pr_root] [@ PR_DestroyThreadPrivate | PR_CleanupThread | PR_NativeRunThread | pr_root ]
Just downloaded latest nightly and not seeing any crashes yet.
I've also confirmed that the latest Nightly does *not* crash on its shutdown.
Nightly(2016-01-27) on x86_32 Windows 7
This got merged to central:
https://hg.mozilla.org/mozilla-central/rev/e265e7992928c9ca7bacfd8bfca1929e974b2467

Thanks for all the testing and verification, folks! Sorry for the inconvenience.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Nightly 47.0a1 updated it and crash with signature _PR_CleanupThread | _PR_NativeRunThread | pr_root 
https://crash-stats.mozilla.com/report/index/abc72302-b05a-4b01-9fc6-d00492160127
Looks like this affect 47 but maybe not 46.  Tracking for 47 since this is a regression.
So this is fixed for 47, right? If yes, can we please set the right status flags? Ted?
Flags: needinfo?(ted)
Yes Nightly 47.0a1
Thanks.
Flags: needinfo?(ted)
I don't believe bug 1237863 made it to 46 (the target milestone indicates it didn't), so this shouldn't be an issue there.
I’m still seeing this issue on Firefox 47.0a1. Shouldn’t be fixed and verified?
I’ve encountered this crash under Windows 10 64-bit under a Dell Xps 12.

This is my crash report: bp-3453c143-8011-4861-90d9-482652160203

And please take a look also on reports list https://crash-stats.mozilla.com/report/list?product=Firefox&signature=_PR_CleanupThread+|+_PR_NativeRunThread+|+pr_root#tab-reports
Flags: needinfo?(bernesb)
Sorry, just noticed that I had a build from 2016-01-26. I confirm that this bug is fixed on latest Firefox 47.0a1 (2016-02-03).
Flags: needinfo?(bernesb)
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: