Closed Bug 1288579 Opened 3 years ago Closed 3 years ago

Hang in js::GCParallelTask::join()

Categories

(Core :: JavaScript: GC, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox49 --- unaffected
firefox50 --- fixed
firefox51 --- fixed

People

(Reporter: jrmuizel, Assigned: jonco)

References

Details

(Keywords: dogfood, regression)

Attachments

(2 files)

Here's the stack:

  * frame #0: 0x00007fff8dce8716 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff8c256c3b libsystem_pthread.dylib`_pthread_cond_wait + 727
    frame #2: 0x0000000107b8cc45 XUL`js::ConditionVariable::wait_for(js::LockGuard<js::Mutex>&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&) + 53
    frame #3: 0x0000000108058969 XUL`js::GCParallelTask::join() + 121
    frame #4: 0x0000000107f4b7aa XUL`js::gc::GCRuntime::incrementalCollectSlice(js::SliceBudget&, JS::gcreason::Reason, js::AutoLockForExclusiveAccess&) + 1546
    frame #5: 0x0000000107f4be31 XUL`js::gc::GCRuntime::gcCycle(bool, js::SliceBudget&, JS::gcreason::Reason) + 593
    frame #6: 0x0000000107f4c3cf XUL`js::gc::GCRuntime::collect(bool, js::SliceBudget, JS::gcreason::Reason) + 431
    frame #7: 0x0000000107f4eef2 XUL`JS::StartIncrementalGC(JSContext*, JSGCInvocationKind, JS::gcreason::Reason, long long) + 194
    frame #8: 0x000000010556c214 XUL`nsJSContext::GarbageCollectNow(JS::gcreason::Reason, nsJSContext::IsIncremental, nsJSContext::IsShrinking, long long) + 436
    frame #9: 0x000000010431b3bd XUL`nsTimerImpl::Fire() + 1101

It looks like all of the other js helper threads are in:
    frame #2: 0x0000000107b8cc45 XUL`js::ConditionVariable::wait_for(js::LockGuard<js::Mutex>&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&) + 53
    frame #3: 0x000000010805990c XUL`js::HelperThread::threadLoop() + 876

except for one thread in:
    frame #1: 0x00007fff8c256c3b libsystem_pthread.dylib`_pthread_cond_wait + 727
    frame #2: 0x0000000107b8cc45 XUL`js::ConditionVariable::wait_for(js::LockGuard<js::Mutex>&, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator> const&) + 53
    frame #3: 0x0000000108055a2c XUL`js::GlobalHelperThreadState::wait(js::AutoLockHelperThreadState&, js::GlobalHelperThreadState::CondVar, mozilla::BaseTimeDuration<mozilla::TimeDurationValueCalculator>) + 60
    frame #4: 0x0000000108207fa3 XUL`js::gc::GCRuntime::refillFreeListOffMainThread(js::ExclusiveContext*, js::gc::AllocKind) + 115
    frame #5: 0x00000001082058b3 XUL`JSObject* js::Allocate<JSObject, (js::AllowGC)1>(js::ExclusiveContext*, js::gc::AllocKind, unsigned long, js::gc::InitialHeap, js::Class const*) + 99
    frame #6: 0x0000000107f60d94 XUL`JSObject::create(js::ExclusiveContext*, js::gc::AllocKind, js::gc::InitialHeap, JS::Handle<js::Shape*>, JS::Handle<js::ObjectGroup*>) + 132
    frame #7: 0x0000000107f6a4ed XUL`NewObject(js::ExclusiveContext*, JS::Handle<js::ObjectGroup*>, js::gc::AllocKind, js::NewObjectKind, unsigned int) + 349
    frame #8: 0x0000000107f6a92b XUL`js::NewObjectWithClassProtoCommon(js::ExclusiveContext*, js::Class const*, JS::Handle<JSObject*>, js::gc::AllocKind, js::NewObjectKind) + 747
    frame #9: 0x00000001081f38bf XUL`js::frontend::BytecodeEmitter::emitObject(js::frontend::ParseNode*) + 271
    frame #10: 0x00000001081d96ce XUL`js::frontend::BytecodeEmitter::emitTree(js::frontend::ParseNode*, js::frontend::BytecodeEmitter::EmitLineNumberNote) + 1022
    frame #11: 0x00000001081e96a0 XUL`js::frontend::BytecodeEmitter::emitAssignment(js::frontend::ParseNode*, JSOp, js::frontend::ParseNode*) + 2880
    frame #12: 0x00000001081d93ff XUL`js::frontend::BytecodeEmitter::emitTree(js::frontend::ParseNode*, js::frontend::BytecodeEmitter::EmitLineNumberNote) + 303
    frame #13: 0x00000001081f0a08 XUL`js::frontend::BytecodeEmitter::emitStatement(js::frontend::ParseNode*) + 648
    frame #14: 0x00000001081d94f7 XUL`js::frontend::BytecodeEmitter::emitTree(js::frontend::ParseNode*, js::frontend::BytecodeEmitter::EmitLineNumberNote) + 551
    frame #15: 0x00000001081d969f XUL`js::frontend::BytecodeEmitter::emitTree(js::frontend::ParseNode*, js::frontend::BytecodeEmitter::EmitLineNumberNote) + 975
    frame #16: 0x00000001081f429e XUL`js::frontend::BytecodeEmitter::emitArgsBody(js::frontend::ParseNode*) + 1198
    frame #17: 0x00000001081d9cec XUL`js::frontend::BytecodeEmitter::emitTree(js::frontend::ParseNode*, js::frontend::BytecodeEmitter::EmitLineNumberNote) + 2588
    frame #18: 0x00000001081dbf54 XUL`js::frontend::BytecodeEmitter::emitFunctionScript(js::frontend::ParseNode*) + 820
    frame #19: 0x00000001081ee5cb XUL`js::frontend::BytecodeEmitter::emitFunction(js::frontend::ParseNode*, bool) + 2059
    frame #20: 0x00000001081d9cd4 XUL`js::frontend::BytecodeEmitter::emitTree(js::frontend::ParseNode*, js::frontend::BytecodeEmitter::EmitLineNumberNote) + 2564
    frame #21: 0x00000001081f1611 XUL`js::frontend::BytecodeEmitter::emitCallOrNew(js::frontend::ParseNode*) + 577
    frame #22: 0x00000001081d944a XUL`js::frontend::BytecodeEmitter::emitTree(js::frontend::ParseNode*, js::frontend::BytecodeEmitter::EmitLineNumberNote) + 378
    frame #23: 0x00000001081f0a08 XUL`js::frontend::BytecodeEmitter::emitStatement(js::frontend::ParseNode*) + 648
    frame #24: 0x00000001081d94f7 XUL`js::frontend::BytecodeEmitter::emitTree(js::frontend::ParseNode*, js::frontend::BytecodeEmitter::EmitLineNumberNote) + 551
    frame #25: 0x00000001081d969f XUL`js::frontend::BytecodeEmitter::emitTree(js::frontend::ParseNode*, js::frontend::BytecodeEmitter::EmitLineNumberNote) + 975
    frame #26: 0x00000001081ea727 XUL`js::frontend::BytecodeEmitter::emitTry(js::frontend::ParseNode*) + 327
    frame #27: 0x00000001081d9a3b XUL`js::frontend::BytecodeEmitter::emitTree(js::frontend::ParseNode*, js::frontend::BytecodeEmitter::EmitLineNumberNote) + 1899
    frame #28: 0x00000001081d969f XUL`js::frontend::BytecodeEmitter::emitTree(js::frontend::ParseNode*, js::frontend::BytecodeEmitter::EmitLineNumberNote) + 975
    frame #29: 0x00000001081d9084 XUL`BytecodeCompiler::prepareAndEmitTree(js::frontend::ParseNode**) + 228
    frame #30: 0x00000001081db1e7 XUL`BytecodeCompiler::compileScript(JS::Handle<JSObject*>, JS::Handle<JSScript*>) + 935
    frame #31: 0x00000001081dc3c6 XUL`js::frontend::CompileScript(js::ExclusiveContext*, js::LifoAlloc*, JS::Handle<JSObject*>, JS::Handle<js::StaticScope*>, JS::Handle<JSScript*>, JS::ReadOnlyCompileOptions const&, JS::SourceBufferHolder&, JSString*, js::SourceCompressionTask*, js::ScriptSourceObject**) + 102
    frame #32: 0x0000000108056441 XUL`js::ScriptParseTask::parse() + 289
Component: JavaScript Engine → JavaScript: GC
This happened to me again.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #1)
> This happened to me again.

What platform are you running, and, assuming it's linux, what is the kernel version?
Flags: needinfo?(jmuizelaar)
OS X 10.9
Flags: needinfo?(jmuizelaar)
I seem to run into this every couple of days.
It also reproduces on OSX 10.10 on try every 20 or 30 runs. Unfortunately, I've so far not managed to get it in a debugger. When it was manifesting on Linux I gdb and rr both hid the problem, so I'm not confident that we'll even be able to.
Did you happen to try with rr's chaos mode to tweak the scheduler's behavior? (Just commenting in case you didn't know about it...)
Extensively.
I got this just now, and I think Bill has seen it too. I'm on OSX.
It looks like the |decommitTask.join()| in incrementalCollectSlice is hanging.
Blocks: 1119537
Attached file gc.hang
Hung Content process on linux64 mozilla-central 
changeset 97a52326b06a
I've hit this twice today on OS X 10.11. Maybe we can add some release asserts on Nightly?
Is it because GCRuntime::refillFreeListOffMainThread can eat the wake-up signal?

GCParallelTask::startWithLockHeld does HelperThreadState().notifyOne(GlobalHelperThreadState::PRODUCER) to kick off the task.

~AutoTraceSession does notifyAll(GlobalHelperThreadState::PRODUCER) when we return to the mutator - but that won't happen in a non-incremental GC.

Starting the background sweep task does HelperThreadState().notifyAll(GlobalHelperThreadState::PRODUCER) for some reason so wouldn't suffer from this problem.

I think refillFreeListOffMainThread should probably be using a different mechanism to wait for GC to finish.
Attachment #8784007 - Flags: review?(terrence)
Comment on attachment 8784007 [details] [diff] [review]
bug1288579-decommit-hang

Review of attachment 8784007 [details] [diff] [review]:
-----------------------------------------------------------------

You are my hero.
Attachment #8784007 - Flags: review?(terrence) → review+
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5b07a9829bd7
Fix hang when GC is decommitting r=terrence
https://hg.mozilla.org/mozilla-central/rev/5b07a9829bd7
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Depends on: 1298639
Terrence, can you please request Aurora approval on this since Jon is on PTO? Thanks!
Assignee: nobody → jcoppeard
Flags: needinfo?(terrence)
Keywords: dogfood, regression
Comment on attachment 8784007 [details] [diff] [review]
bug1288579-decommit-hang

Approval Request Comment
[Feature/regressing bug #]: bug 1119537
[User impact if declined]: intermittent hangs
[Describe test coverage new/current, TreeHerder]: TH for the last week. Solves many intermittent failures there.
[Risks and why]: Low. We understand what's going wrong now and this fixes it.
[String/UUID change made/needed]: None.
Flags: needinfo?(terrence)
Attachment #8784007 - Flags: approval-mozilla-aurora?
Comment on attachment 8784007 [details] [diff] [review]
bug1288579-decommit-hang

Fixes possible hangs and makes our test automation more robust, stabilized on Nightly for a week, Aurora50+
Attachment #8784007 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Backed out for bustage. It's running afoul of bug 1290156 changing the signature of GlobalHelperThreadState::notifyOne. I'm guessing it's an easy rebase if you know what you're doing, though :)
https://treeherder.mozilla.org/logviewer.html#?job_id=3449573&repo=mozilla-aurora

https://hg.mozilla.org/releases/mozilla-aurora/rev/99c0c53f5ea3
Flags: needinfo?(terrence)
Indeed! The lock parameter is new.

https://hg.mozilla.org/releases/mozilla-aurora/rev/0baff3de0d57
Flags: needinfo?(terrence)
No longer depends on: 1298639
Version: unspecified → Trunk
You need to log in before you can comment on or make changes to this bug.