Closed
Bug 721025
Opened 13 years ago
Closed 8 years ago
Lots of crashes in GetJSContext from single-threaded runtime release assert on main thread, and about:jank addon
Categories
(Core :: XPConnect, defect)
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: jrmuizel, Unassigned)
References
Details
(Keywords: crash)
Crash Data
I've been getting a fair amount of crashes in GetJSContext recently:
http://crash-stats.mozilla.com/report/index/bp-bda695ae-5eee-4da5-8575-38eae2120125
http://crash-stats.mozilla.com/report/index/bp-07aec336-4a84-478e-83bd-a95b22120124
http://crash-stats.mozilla.com/report/index/bp-91911aa1-e630-47f7-9aa9-d2c7c2120123
http://crash-stats.mozilla.com/report/index/bp-89781e26-960b-458d-ad77-9b1462120120
Updated•13 years ago
|
Assignee: general → nobody
Component: JavaScript Engine → XPConnect
QA Contact: general → xpconnect
Comment 1•13 years ago
|
||
Reporter | ||
Comment 2•13 years ago
|
||
Fwiw, bug 715713 prevents these from properly showing up in crash stats.
Depends on: 715713
Updated•13 years ago
|
Comment 3•13 years ago
|
||
Oh, this is the single-threaded runtime release assert. If I read the buildids correctly, the crashes were before the landing of bug 675078 yesterday.
Also a weird thing about these stacks is that the aborts are happening on the main thread. That would imply perhaps some funny business being played with threads? I wonder how this hasn't been hit until recently since the assert has been in for several months. I also see that this is all on the nightly-profiling branch; perhaps something is particular to this branch?
Comment 4•13 years ago
|
||
I just got this crash:
https://crash-stats.mozilla.com/report/index/bp-e1d430be-c981-41ab-99eb-111f42120127
Please let me know if I can provide any helpful information. Thanks!
Comment 5•13 years ago
|
||
I crashed as well and would be happy to help!
https://crash-stats.mozilla.com/report/index/bp-097fba0e-9442-4c36-be2c-573832120127
Comment 6•13 years ago
|
||
Reporter | ||
Comment 7•13 years ago
|
||
I think this crash is caused by about:jank/profiler. Has anyone who sees the crash not installed either of those addons?
Comment 8•13 years ago
|
||
Uh, yeah I installed that some minutes before the crash.
Comment 9•13 years ago
|
||
Same same.
Comment 10•13 years ago
|
||
Ah, clues! So does anyone know if the about:jank profiler does any tricks with threads or thread ids? Basically, JSRuntime stores PR_CurrentThread when it is created and asserts that is equal to PR_CurrentThread anytime JS_AbortIfWrongThread is called (e.g. in GetJSContext). There is also a little stupid dance where we temporarily change threads (in a single-threaded manner) during cycle collection in case that is relevant...
Comment 11•13 years ago
|
||
(In reply to Tim Taubert [:ttaubert] from comment #8)
> Uh, yeah I installed that some minutes before the crash.
Same here.
Comment 12•13 years ago
|
||
Taras blogged about:jank.
Comment 13•13 years ago
|
||
(In reply to Axel Hecht from comment #12)
> Taras blogged about:jank.
Jeff wrote about:jank! ;-)
Comment 14•13 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #10)
> Ah, clues! So does anyone know if the about:jank profiler does any tricks
> with threads or thread ids? Basically, JSRuntime stores PR_CurrentThread
> when it is created and asserts that is equal to PR_CurrentThread anytime
> JS_AbortIfWrongThread is called (e.g. in GetJSContext). There is also a
> little stupid dance where we temporarily change threads (in a
> single-threaded manner) during cycle collection in case that is relevant...
We're building with frame pointers. While the extension is active we send a signal each 10ms to the main thread which will push a signal handler, perform a backtrace and resume.
Comment 15•13 years ago
|
||
Does the signal handler possibly touch JS or XPConnect?
Comment 16•13 years ago
|
||
That would be 'TableTicker::Tick'. The only thing we use from gecko is TimeStamps. There's no JS or XPConnect.
Reporter | ||
Comment 17•13 years ago
|
||
So my best guess here is that perhaps something with TLS is getting screwed up.
Reporter | ||
Comment 18•13 years ago
|
||
I was able to catch this in a debugger. mJSContext->runtime->ownerThread_ is equal to 0xc1ea12 when it should be equal to 0x100336220 (the value returned by PR_GetCurrentThread() and pthread_getspecific(261))
Comment 19•13 years ago
|
||
That would be caused by JSRuntime::clearOwnerThread: http://mxr.mozilla.org/mozilla-central/source/js/src/jsapi.cpp#891
Reporter | ||
Comment 20•13 years ago
|
||
Also for this runtime
suspendCount = 1,
requestDepth = 0,
I'm not sure if that's expected or not.
Reporter | ||
Comment 21•13 years ago
|
||
(In reply to Josh Matthews [:jdm] from comment #19)
> That would be caused by JSRuntime::clearOwnerThread:
> http://mxr.mozilla.org/mozilla-central/source/js/src/jsapi.cpp#891
It looks like the likely caller of that is nsXPConnect::NotifyLeaveMainThread() which is called in one place by the Cyclecollector. Further, I don't see how we could avoid calling NotifyEnterMainThread() which should reset it to the proper value.
Comment 22•13 years ago
|
||
Hmm, all hints seem to point toward something happening with cycle collection. One random idea is that perhaps a bug in NSPR is causing the cycle collector thread to get notified unexpectedly (so it would run a cycle collection concurrent with the main thread (bad!) and leave rt->ownerThread_ in the 'clear' state).
First, I'd try running a debug build (which may catch things a lot earlier). Second, I'd put a printf (with flush, obviously) in JS_ClearRuntimeThread, JS_SetRuntimeThread, nsCycleCollector::BeginCollection, and the signal handler and see if there are any weird interleaving.
Reporter | ||
Comment 23•13 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #22)
> First, I'd try running a debug build (which may catch things a lot earlier).
> Second, I'd put a printf (with flush, obviously) in JS_ClearRuntimeThread,
> JS_SetRuntimeThread, nsCycleCollector::BeginCollection, and the signal
> handler and see if there are any weird interleaving.
That sounds like a good idea. If someone can come up with a good way of reproducing this that would be great. I've only been able to see it after about a day of regular use in a browser.
Reporter | ||
Comment 24•13 years ago
|
||
I was able to get a core dump of this assertion failing:
JS_ASSERT(ownerThread_ == (void *)0xc1ea12);
during
nsXPConnect::NotifyEnterCycleCollectionThread ()
Here's some state when this happens.
cycle collector thread - 0x117801000
main thread - 0x7fff70f6fcc0
rt->ownerThread - 0x7fff70f6fcc0
condition variable lock owner - 0x117801000
The main thread is not waiting in nsCycleCollectorRunner::Collect
I don't yet have any theories as to what's going wrong.
Comment 25•13 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #24)
The main thread not waiting is a big red flag. This sounds like exactly what comment 22 para 1 is guessing. Perhaps there some bug involving signal handlers (or the backtrace facility) and NSPR's condition variables?
Another question: I only see crash reports from OS X above. Does anyone know anyone using about:jank on windows and having/not-having problems?
Comment 26•13 years ago
|
||
One thing to note is that NSPR, and specifically PR_WaitCondVar (which is the underlying implementation of the CondVar class), does not protect against spurious wakeups or interrupted threads. CondVar::Wait actually returns an nsresult, so an NS_FAILED along with PR_GetError would be useful to see if NSPR is being fussy here.
Reporter | ||
Comment 27•13 years ago
|
||
I added the following code:
nsresult result = mRequest.Wait();
if (result != NS_OK) {
printf("%x\n", result);
assert(result == NS_OK);
}
and I still get the crash without hitting the assert
Comment 28•13 years ago
|
||
Perhaps the bug is in Wait() ?
Comment 29•13 years ago
|
||
Wait is just a wrapper around PR_WaitCondVar, with some extra machinery for the deadlock detector in debug builds.
Comment 30•13 years ago
|
||
Ok, then perhaps PR_WaitCondVar (on OS X) has a bug.
Updated•13 years ago
|
Crash Signature: [@ CrashInJS] → [@ CrashInJS]
[@ CrashInJS | XPCCallContext::GetJSContext]
[@ CrashInJS | XPCCallContext::GetJSContext(JSContext**)]
Comment 31•13 years ago
|
||
This is topcrash #3 in 11.0b3 now: https://crash-stats.mozilla.com/topcrasher/byversion/Firefox/11.0b3
Do we have any clue how to fix this?
tracking-firefox11:
--- → ?
Comment 32•13 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #31)
> This is topcrash #3 in 11.0b3 now:
> https://crash-stats.mozilla.com/topcrasher/byversion/Firefox/11.0b3
>
> Do we have any clue how to fix this?
Sorry, it's actually bug 715757.
tracking-firefox11:
? → ---
Updated•13 years ago
|
Crash Signature: [@ CrashInJS]
[@ CrashInJS | XPCCallContext::GetJSContext]
[@ CrashInJS | XPCCallContext::GetJSContext(JSContext**)] → [@ CrashInJS]
[@ CrashInJS | XPCCallContext::GetJSContext ]
[@ CrashInJS | XPCCallContext::GetJSContext(JSContext**) ]
Updated•9 years ago
|
Summary: Lots of crashes in GetJSContext → Lots of crashes in GetJSContext from single-threaded runtime release assert on main thread, and about:jank addon
Comment 33•8 years ago
|
||
This signature doesn't exist anymore
https://crash-stats.mozilla.com/signature/?signature=CrashInJS&date=%3E%3D2016-06-14T04%3A38%3A06.000Z&date=%3C2016-12-14T04%3A38%3A06.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1#reports
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•