Closed Bug 1448592 Opened 6 years ago Closed 5 years ago

Hit MOZ_CRASH(NSS_Shutdown failed) at xpcom/build/XPCOMInit.cpp:1015

Categories

(Core :: Security: PSM, defect, P1)

61 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox-esr60 --- unaffected
firefox60 --- unaffected
firefox61 --- wontfix
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 --- fixed

People

(Reporter: bc, Assigned: keeler)

References

()

Details

(Keywords: regression, Whiteboard: [psm-assigned])

Attachments

(3 files)

Attached file moz_crash.txt
1. load url via bughunter's marionette based runner, for example https://blog.caranddriver.com/nine-things-you-must-know-about-the-new-mercedes-inline-six/ or 1,200 others.

nightly 61 debug asan only and only with the marionette runner https://hg.mozilla.org/automation/sisyphus/file/tip/python/sisyphus/automation/runner.py

At a minimum you'll need to install python packages requests, mozprofile, marionette_driver.

python runner.py --binary nightly-asan/mozilla/firefox-debug/dist/bin/firefox --profile /tmp/firefox-nightly-asan/ --url 'https://blog.caranddriver.com/nine-things-you-must-know-about-the-new-mercedes-inline-six/'

First appeared about 3/21.

2. WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME.  FIX THIS!
Hit MOZ_CRASH(NSS_Shutdown failed) at /builds/worker/workspace/build/src/xpcom/build/XPCOMInit.cpp:1015

==4564==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7fae62a4e30d bp 0x7ffd84c92950 sp 0x7ffd84c92700 T0)

==4564==The signal is caused by a WRITE memory access.
==4564==Hint: address points to the zero page.
    #0 0x7fae62a4e30c in mozilla::ShutdownXPCOM(nsIServiceManager*) /builds/worker/workspace/build/src/xpcom/build/XPCOMInit.cpp:950:12

From the comments around https://searchfox.org/mozilla-central/source/xpcom/build/XPCOMInit.cpp#1015, I'm putting this in Security:PSM rather than XPCOM.
Needs triage.
Flags: needinfo?(dkeeler)
I'm having trouble reproducing this. Maybe some 3rd party content changed and now it doesn't reproduce on that url - is there another I can try? (Also, feel free to need-info me sooner next time - for some reason I didn't get bugmail when this was filed...)
Flags: needinfo?(dkeeler) → needinfo?(bob)
Attached file bug-1448592.txt
This reproduced for me twice out of three tries using today's *nightly debug asan* build on Linux x86_64 (Fedora 27):

url="https://blog.caranddriver.com/nine-things-you-must-know-about-the-new-mercedes-inline-six/" && python $TEST_DIR/python/sisyphus/automation/runner.py --binary /mozilla/builds/nightly-asan/mozilla/firefox-debug/dist/bin/firefox --profile /home/bclary/mozilla/profiles/firefox/lithium/ --url "$url"  > /tmp/bug-1448592.txt 2>&1

Did you use both a nightly debug asan build and runner.py ?
Flags: needinfo?(bob)
Ah - I managed to reproduce it. I just needed to try more times.
Triage level?
Flags: needinfo?(dkeeler)
What I think is going on here is that something is leaking (or just isn't getting cleaned up properly at shutdown), which is keeping a JSRuntime alive, which is holding on to some NSS resources. This isn't a regression per-se - the "complain about leaking a JSRuntime and then do nothing about it" code has been in the tree for a long time. What's new is bug 1437128 added code to attempt to enforce that no NSS resources are leaked in debug (and non-android) builds. It would be nice to figure out what's leaking that's holding everything alive here, but I've made little progress there (in part because this still reproduces so infrequently for me). Short of that, one way we could address this is to assume that if a JSRuntime leaks then it's likely that NSS resources will leak as well. Another option would be some sort of environment or build variable that controls whether or not this is a fatal assertion. I'd like to keep this fatal in the in-tree tests because that's the only way we'll catch new leaks.

Bob - what do you think? Would it work for you to add a way for bughunter to turn off the fatalness of this assertion? Or is that even necessary? (I don't have a sense of how much this is impacting bughunter since, again, this isn't happening for me very often.) Also, are there other URLs this happens on more frequently?
Flags: needinfo?(dkeeler) → needinfo?(bob)
Priority: -- → P3
Whiteboard: [psm-backlog]
A pref or some other means to convert it to a non-fatal assertion would be great. I've looked and again I have plenty of examples but none that are easily reproduced.
Flags: needinfo?(bob)

I hit this several times this morning on https://terraria.gamepedia.com/Dye_Vat with both Beta 65 and Nightly 66 Debug on Fedora 29. I then tried opt builds which hit MergeState::ProcessItemFromNewList (bug 1468021). I just tried a couple of times again and hit he NSS_Shutdown failed then the MergeState crash.

You might have luck with reproducing with this url and a fresh profile.

Flags: needinfo?(dkeeler)

I seem to only be encountering other assertion failures/crashes. In any case, I'll add an environment variable that can be used as an escape hatch.

Assignee: nobody → dkeeler
Flags: needinfo?(dkeeler)
Priority: P3 → P1
Whiteboard: [psm-backlog] → [psm-assigned]
Pushed by dkeeler@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/3255d0ab9b94
add ability to ignore NSS shutdown leaks with MOZ_IGNORE_NSS_SHUTDOWN_LEAKS r=froydnj
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: