nsScriptLoader still not thread-safe

RESOLVED FIXED in Firefox 38

Status

()

defect
RESOLVED FIXED
5 years ago
4 months ago

People

(Reporter: mccr8, Assigned: smaug)

Tracking

({sec-moderate})

32 Branch
mozilla38
Points:
---

Firefox Tracking Flags

(firefox36 wontfix, firefox37 wontfix, firefox38 fixed, firefox-esr31 wontfix, b2g-v1.4 wontfix, b2g-v2.0 wontfix, b2g-v2.0M wontfix, b2g-v2.1 wontfix, b2g-v2.1S wontfix, b2g-v2.2 wontfix, b2g-master fixed)

Details

(Whiteboard: [adv-main38+], crash signature)

Attachments

(2 attachments)

This is some kind of variant of bug 1111737.  I get this in a local debug build from today's trunk.

STR:
1. Go to http://beta.unity3d.com/jonas/DT2/
2. While the Unity engine loading bar thing is still loading, exit the browser.

result: thread safety assertion.

[68842] WARNING: '!mMainThread', file /Users/amccreight/mc/xpcom/threads/nsThreadManager.cpp, line 404
[68842] WARNING: 'NS_FAILED(rv)', file /Users/amccreight/mc/xpcom/glue/nsThreadUtils.cpp, line 174
Hit MOZ_CRASH(nsScriptLoader not thread-safe) at /Users/amccreight/mc/dom/base/nsScriptLoader.cpp:178
#01: nsRunnable::Release()[/Users/amccreight/mc/obj-dbg.noindex/dist/NightlyDebug.app/Contents/MacOS/XUL +0xfc33e]
#02: js::HelperThread::handleParseWorkload()[/Users/amccreight/mc/obj-dbg.noindex/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3926523]
#03: js::HelperThread::threadLoop()[/Users/amccreight/mc/obj-dbg.noindex/dist/NightlyDebug.app/Contents/MacOS/XUL +0x3925201]
#04: _pt_root[/Users/amccreight/mc/obj-dbg.noindex/dist/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x33c199]
#05: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x1899]
#06: _pthread_struct_init[/usr/lib/system/libsystem_pthread.dylib +0x172a]
Process 68842 stopped
* thread #16: tid = 0x61e4bb, 0x00000001024792d5 XUL`nsScriptLoader::Release(this=<unavailable>) + 213 at nsScriptLoader.cpp:178, name = 'Analysis Helper', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)


(lldb) bt
* thread #16: tid = 0x61e4bb, 0x00000001024792d5 XUL`nsScriptLoader::Release(this=<unavailable>) + 213 at nsScriptLoader.cpp:178, name = 'Analysis Helper', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
  * frame #0: 0x00000001024792d5 XUL`nsScriptLoader::Release(this=<unavailable>) + 213 at nsScriptLoader.cpp:178
    frame #1: 0x000000010249f1a1 XUL`(anonymous namespace)::NotifyOffThreadScriptLoadCompletedRunnable::~NotifyOffThreadScriptLoadCompletedRunnable() [inlined] nsRefPtr<nsScriptLoader>::~nsRefPtr() + 33 at nsRefPtr.h:59
    frame #2: 0x000000010249f193 XUL`(anonymous namespace)::NotifyOffThreadScriptLoadCompletedRunnable::~NotifyOffThreadScriptLoadCompletedRunnable() [inlined] nsRefPtr<nsScriptLoader>::~nsRefPtr() at nsRefPtr.h:57
    frame #3: 0x000000010249f193 XUL`(anonymous namespace)::NotifyOffThreadScriptLoadCompletedRunnable::~NotifyOffThreadScriptLoadCompletedRunnable() [inlined] (anonymous namespace)::NotifyOffThreadScriptLoadCompletedRunnable::~NotifyOffThreadScriptLoadCompletedRunnable(this=0x0000000114495d00) + 10 at nsScriptLoader.cpp:787
    frame #4: 0x000000010249f189 XUL`(anonymous namespace)::NotifyOffThreadScriptLoadCompletedRunnable::~NotifyOffThreadScriptLoadCompletedRunnable() [inlined] (anonymous namespace)::NotifyOffThreadScriptLoadCompletedRunnable::~NotifyOffThreadScriptLoadCompletedRunnable(this=0x0000000114495d00, ptr=<unavailable>) at nsScriptLoader.cpp:787
    frame #5: 0x000000010249f189 XUL`(anonymous namespace)::NotifyOffThreadScriptLoadCompletedRunnable::~NotifyOffThreadScriptLoadCompletedRunnable(this=0x0000000114495d00) + 9 at nsScriptLoader.cpp:787
    frame #6: 0x000000010161433e XUL`nsRunnable::Release(this=<unavailable>) + 94 at nsThreadUtils.cpp:32
    frame #7: 0x0000000104e3e523 XUL`js::HelperThread::handleParseWorkload(this=0x000000010bceed80) + 547 at HelperThreads.cpp:1190
    frame #8: 0x0000000104e3d201 XUL`js::HelperThread::threadLoop(this=0x000000010bceed80) + 721 at HelperThreads.cpp:1373
    frame #9: 0x000000010133c199 libnss3.dylib`_pt_root(arg=0x000000010032a370) + 281 at ptthread.c:212
    frame #10: 0x00007fff88f6d899 libsystem_pthread.dylib`_pthread_body + 138
    frame #11: 0x00007fff88f6d72a libsystem_pthread.dylib`_pthread_start + 137
    frame #12: 0x00007fff88f71fc9 libsystem_pthread.dylib`thread_start + 13
from bug 1111737:

(In reply to Boris Zbarsky [:bz] from comment #6)
> What if the runnable never runs?

Maybe this is what happens? ;)
The NS_FAILED from the above log is in NS_DispatchToMainThread, so maybe OffThreadScriptLoaderCallback is calling NS_DispatchToMainThread, but it is too late in shutdown?

The line prior to the one I posted is
  WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME.  FIX THIS!
so we're somewhere in JS_ShutDown when the failure happens, probably in the part where it shuts down the threads.
Yeah, that failure is indeed in OffThreadScriptLoaderCallback.  So we're shutting down the JS helper threads after the point where we can call NS_DispatchToMainThread.  Yeah, okay, it looks like the failure is happening in nsThreadManager::GetMainThread, which is what that prior warning is.  I'm not sure when that goes bad.
It looks like mMainThread gets cleared on this line in ShutdownXPCOM:
  nsThreadManager::get()->Shutdown()
...then a bunch of lines later we call JS_ShutDown, which I'm guessing shuts down the helper threads, and then kablooey.
jst said if nothing else we could just intentionally leak if NS_DispatchToMainThread fails.  Though we may have other lurking issues if we're shutting down JS threads so much after we're shutting down XPCOM thread utils.
fyi, bughunter has seen this crash for some time and it is in fact the top crash for bughunter. It isn't amenable to test case reduction however. I've been retesting all of the 43,000+ urls where bughunter reproduced a crash from October-December and have seen 3 sets of stacks similar to:

nsScriptLoader::Release nsRefPtr<nsScriptLoader>::~nsRefPtr nsRefPtr<nsScriptLoader>::~nsRefPtr 10743 times on osx 10.6, 10.8, 10.9 for beta,aurora,nightly for 612 urls. osx testing is only about 2/3 complete so the final number will end up being higher.

nsScriptLoader::Release nsRefPtr<nsScriptLoader>::~nsRefPtr<nsScriptLoader> `anonymous namespace'::NotifyOffThreadScriptLoadCompletedRunnable::~NotifyOffThreadScriptLoadCompletedRunnable `anonymous namespace'::NotifyOffThreadScriptLoadCompletedRunnable::`scalar deleting destructor' nsRunnable::Release 4707 times on windows 7 32bit and 64bit for beta,aurora,nightly for 867 urls. windows testing is complete.

nsScriptLoader::Release NotifyOffThreadScriptLoadCompletedRunnable::~NotifyOffThreadScriptLoadCompletedRunnable nsRunnable::Release OffThreadScriptLoaderCallback js::HelperThread::handleParseWorkload 179 times on rhel6 32bit and 64bit  for beta, aurora, nighty for 179 urls. linux testing is complete.

If you would like the crash reports from minidump_stackwalk, please let me know and I'll attach them. I'm not sure how useful the urls would be since I've tried to reproduce this in the past and failed.
Bob, are those crashes shutdown crashes, or happening during normal browsing too?
Flags: needinfo?(bob)
Looking at the logs, it appeared that there were both shutdown and other crashes. I could reproduce about 25% of the time with a url on Windows 7 using the Spider extension I use to load urls. The bughunter logs originally showed a crash before attempting to shutdown with this url but when I tried it, the crashes were all on shutdown. I couldn't reproduce it running the page manually without the Spider extension though. So maybe this is an artifact from using the extension.

STR:

1. Install http://bclary.com/2004/07/10/spider/spider.xpi into a profile, e.g. test-profile

2. for i in {1..20}; do echo $i; if firefox -P test-profile -spider -url "http://www.fashion-woman.com/stil-i-moda/tendencii/modnaya-obuv-osen-zima-2014-2015/" -depth 0 -timeout 60 -wait 0 -start -quit 2>&1 | tee /tmp/crash.log | grep MOZ_CRASH; then break; fi; done

reproduced the crash on beta, aurora, nightly on windows 7 and osx 10.9 for me.
Flags: needinfo?(bob)
Wait, which crash happens on beta? I thought we had the thread safety check only on nightly.
Posted patch wipSplinter Review
Assignee: nobody → bugs
Attachment #8566172 - Flags: review?(continuation)
Attachment #8566172 - Flags: review?(continuation) → review+
Backed out for Nexus 5-L bustage. Of particular note, it's the only Android L-based build we have in automation.
https://hg.mozilla.org/integration/mozilla-inbound/rev/2b2f1d308c3a

https://treeherder.mozilla.org/logviewer.html#?job_id=6747192&repo=mozilla-inbound
bah, sorry.
Posted patch +includeSplinter Review
(In reply to Olli Pettay [:smaug] (high review load) from comment #9)
> Wait, which crash happens on beta? I thought we had the thread safety check
> only on nightly.

I hit the same MOZ_CRASH on beta and aurora as well as nightly.
https://hg.mozilla.org/mozilla-central/rev/0aa5dec4bf2a
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
How far back does this go? Should this land on at least Beta?
status-b2g-v2.1: --- → ?
Flags: needinfo?(bugs)
I think this goes all the way to Bug 906371. Have we seen any real world non-shutdown crashes related to this?
Flags: needinfo?(bugs)
Mccr8 might answer to it :)
Flags: needinfo?(continuation)
Well, in comment 8, bc said that some of them showed up not at shutdown, but only with the addon, so maybe the addon was doing something weird.  I think we can just let this one go.
Flags: needinfo?(continuation)
I came across a few crash report for this on crash-stats (for Aurora, where this did not land):
  https://crash-stats.mozilla.com/report/index/294af7d6-85f3-43bd-8edd-0cf9e2150319
  https://crash-stats.mozilla.com/report/index/a0f270f7-0eeb-45cc-b457-6e3652150322
  https://crash-stats.mozilla.com/report/index/d5f0d2b2-400b-493e-9326-448612150317

They are, as expected, happening during XPCOM shutdown.
Crash Signature: [@ NS_CycleCollectorSuspect3 ]
Whiteboard: [adv-main38+]
Group: core-security → core-security-release
Group: core-security-release
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.