Closed
Bug 922096
Opened 11 years ago
Closed 11 years ago
Odinmonkey: assertion failure group.state.asmJSWorklist.empty(), at js/src/jit/AsmJS.cpp:5203
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
VERIFIED
FIXED
mozilla27
People
(Reporter: dougc, Assigned: luke)
References
()
Details
Attachments
(1 file)
5.40 KB,
patch
|
sstangl
:
review+
lsblakk
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
Seeing crashes due to assertion failures testing asm.js code on a m-c debug build. The crashes are not systematic, but can be reproduced with some patience. Starting three tabs, each loading the Citadel demo, starting each about one second apart, gives a reasonable chance of reproducing here. The Citadel files are cached locally, which might make a difference. This example is on the x86, but the same or similar crash is also encountered on a x64 build. (gdb) back #0 0xf772c430 in __kernel_vsyscall () #1 0xf7484f96 in nanosleep () from /lib/libc.so.6 #2 0xf7484dc3 in sleep () from /lib/libc.so.6 #3 0xf2c4f92e in ah_crap_handler (signum=11) at toolkit/xre/nsSigHandlers.cpp:88 #4 0xf2c56fc1 in nsProfileLock::FatalSignalHandler (signo=11, info=0xfff5c64c, context=0xfff5c6cc) at profile/dirserviceprovider/src/nsProfileLock.cpp:190 #5 0xf4cadd02 in AsmJSFaultHandler (context=0xfff5c6cc, info=0xfff5c64c, signum=11) at js/src/jit/AsmJSSignalHandlers.cpp:988 #6 AsmJSFaultHandler (signum=11, info=0xfff5c64c, context=0xfff5c6cc) at js/src/jit/AsmJSSignalHandlers.cpp:970 #7 <signal handler called> #8 0xf4f7958c in CheckFunctionsParallelImpl (m=..., group=...) at js/src/jit/AsmJS.cpp:5203 #9 0xf4f79859 in CheckFunctionsParallel (m=...) at js/src/jit/AsmJS.cpp:5272 #10 CheckModule (cx=cx@entry=0xdc2c8540, parser=..., stmtList=stmtList@entry=0xdc9f22e8, module=module@entry=0xfff5d418, compilationTimeReport=compilationTimeReport@entry=0xfff5d410) at js/src/jit/AsmJS.cpp:6397 #11 0xf4f7b996 in js::CompileAsmJS (cx=0xdc2c8540, parser=..., stmtList=0xdc9f22e8, validated=0xfff5d48c) at js/src/jit/AsmJS.cpp:6484 #12 0xf4f1670d in js::frontend::Parser<js::frontend::FullParseHandler>::asmJS (this=0xfff5ebf4, list=0xdc9f22e8) at js/src/frontend/Parser.cpp:2534 #13 0xf4f29a21 in js::frontend::Parser<js::frontend::FullParseHandler>::maybeParseDirective (this=0xfff5ebf4, list=list@entry=0xdc9f22e8, pn=pn@entry=0xdc9f2010, cont=cont@entry=0xfff5d4e8) ... (gdb) up #8 0xf4f7958c in CheckFunctionsParallelImpl (m=..., group=...) at js/src/jit/AsmJS.cpp:5203 5203 JS_ASSERT(group.state.asmJSWorklist.empty()); (gdb) list 5198 if (!CheckAllFunctionsDefined(m)) 5199 return false; 5200 5201 JS_ASSERT(group.outstandingJobs == 0); 5202 JS_ASSERT(group.compiledJobs == m.numFunctions()); 5203 JS_ASSERT(group.state.asmJSWorklist.empty()); 5204 JS_ASSERT(group.state.asmJSFinishedList.empty()); 5205 JS_ASSERT(!group.state.asmJSWorkerFailed()); 5206 return true; 5207 } (gdb) print group.state.asmJSWorklist $3 = {<mozilla::VectorBase<js::AsmJSParallelTask*, 0u, js::SystemAllocPolicy, js::Vector<js::AsmJSParallelTask*, 0u, js::SystemAllocPolicy> >> = {<js::SystemAllocPolicy> = {<No data fields>}, static sElemIsPod = true, static sMaxInlineBytes = 1024, static sInlineCapacity = 0, static sInlineBytes = 1, mBegin = 0xd20dc8f0, mLength = 2, mCapacity = 4, mReserved = 4, storage = {u = {bytes = "", _ = 0}}, entered = false, static sMaxInlineStorage = <optimized out>}, <No data fields>}
Assignee | ||
Comment 1•11 years ago
|
||
Nice find! Douglas: when this assert hits, is the base of the stack the main thread or an off-main-thread parse task? Looking at the current state of all the threads, are there any other threads in asm.js compilation? IIRC, we only do one off-main-thread parse at a time, but this could still happen concurrent with the main thread. My guess is that off-main-thread parsing is causing some race condition on the shared parallel asmJS worker state that didn't exist when all parsing happened from the main thread.
Reporter | ||
Comment 2•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #1) > Nice find! Douglas: when this assert hits, is the base of the stack the > main thread or an off-main-thread parse task? Yes, the crash occurred in the main thread. > Looking at the current state > of all the threads, are there any other threads in asm.js compilation? The following threads appear relevant: 24 Thread 0xe0bfeb40 (LWP 3583) "Analysis Helper" js::jit::AllocationIntegrityState::checkIntegrity (this=this@entry=0xe0bfde3c, block=0xd06f44d0, ins=<optimized out>, vreg=121, alloc=..., populateSafepoints=false) at js/src/jit/RegisterAllocator.cpp:177 23 Thread 0xe03fdb40 (LWP 3584) "Analysis Helper" 0xf4e4d0b4 in bitForValue (value=418) at js/src/jit/BitSet.h:36 22 Thread 0xdfbfcb40 (LWP 3585) "Analysis Helper" 0xf4ebf2db in isMoveGroup (this=<optimized out>) at js/src/jit/LIR.h:691 21 Thread 0xdf3fbb40 (LWP 3586) "Analysis Helper" 0xf77d4430 in __kernel_vsyscall () (gdb) thread 21 [Switching to thread 21 (Thread 0xdf3fbb40 (LWP 3586))] #0 0xf77d4430 in __kernel_vsyscall () (gdb) back #0 0xf77d4430 in __kernel_vsyscall () #1 0xf778c18c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #2 0xf734ff73 in PR_WaitCondVar (cvar=0xe1448980, timeout=4294967295) at nsprpub/pr/src/pthreads/ptsynch.c:385 #3 0xf4d272cf in js::WorkerThreadState::wait (this=0xe1e5c530, which=js::WorkerThreadState::CONSUMER, millis=0) at js/src/jsworkers.cpp:435 #4 0xf50012b8 in GetFinishedCompilation (m=..., group=..., group=...) at js/src/jit/AsmJS.cpp:5117 #5 0xf500ea57 in GenerateCodeForFinishedJob (outTask=<synthetic pointer>, group=..., m=...) at js/src/jit/AsmJS.cpp:5127 #6 CheckFunctionsParallelImpl (m=..., group=...) at js/src/jit/AsmJS.cpp:5174 #7 0xf500f1d9 in CheckFunctionsParallel (m=...) at js/src/jit/AsmJS.cpp:5272 #8 CheckModule (cx=cx@entry=0xd15ae220, parser=..., stmtList=stmtList@entry=0xd153d2e8, module=module@entry=0xdf3f9488, compilationTimeReport=compilationTimeReport@entry=0xdf3f9480) at js/src/jit/AsmJS.cpp:6397 #9 0xf50112c6 in js::CompileAsmJS (cx=0xd15ae220, parser=..., stmtList=0xd153d2e8, validated=0xdf3f94fc) at js/src/jit/AsmJS.cpp:6484 (gdb) thread 22 [Switching to thread 22 (Thread 0xdfbfcb40 (LWP 3585))] #0 0xf4ebf2db in isMoveGroup (this=<optimized out>) at js/src/jit/LIR.h:691 691 LIR_OPCODE_LIST(LIROP) (gdb) back #0 0xf4ebf2db in isMoveGroup (this=<optimized out>) at js/src/jit/LIR.h:691 #1 js::jit::AllocationIntegrityState::checkIntegrity (this=this@entry=0xdfbfbe3c, block=0xd0f99820, ins=<optimized out>, vreg=393, alloc=..., populateSafepoints=false) at js/src/jit/RegisterAllocator.cpp:186 #2 0xf4ec0c62 in js::jit::AllocationIntegrityState::check (this=0xdfbfbe3c, populateSafepoints=false) at js/src/jit/RegisterAllocator.cpp:167 #3 0xf4d9e66a in js::jit::GenerateLIR (mir=0xd0992078) at js/src/jit/Ion.cpp:1395 #4 0xf4d2aec6 in js::WorkerThread::handleAsmJSWorkload (this=this@entry=0xe1403560, state=...) at js/src/jsworkers.cpp:648 #5 0xf4d2c3be in js::WorkerThread::threadLoop (this=0xe1403560) at js/src/jsworkers.cpp:918 (gdb) thread 23 [Switching to thread 23 (Thread 0xe03fdb40 (LWP 3584))] #0 0xf4e4d0b4 in bitForValue (value=418) at js/src/jit/BitSet.h:36 36 return 1l << (uint32_t)(value % (8 * sizeof(uint32_t))); (gdb) back #0 0xf4e4d0b4 in bitForValue (value=418) at js/src/jit/BitSet.h:36 #1 contains (value=418, this=0xcbc66928) at js/src/jit/BitSet.h:63 #2 js::jit::BitSet::Iterator::operator++ (this=0xe03fcad4, dummy=0) at js/src/jit/BitSet.h:161 #3 0xf4e55088 in js::jit::LiveRangeAllocator<js::jit::LinearScanVirtualRegister>::buildLivenessInfo (this=0xe03fcbac) at js/src/jit/LiveRangeAllocator.cpp:758 #4 0xf4e464aa in js::jit::LinearScanAllocator::go (this=0xe03fcbac) at js/src/jit/LinearScan.cpp:1203 #5 0xf4d9e4a9 in js::jit::GenerateLIR (mir=0xd095b078) at js/src/jit/Ion.cpp:1391 #6 0xf4d2aec6 in js::WorkerThread::handleAsmJSWorkload (this=this@entry=0xe14034b0, state=...) at js/src/jsworkers.cpp:648 #7 0xf4d2c3be in js::WorkerThread::threadLoop (this=0xe14034b0) at js/src/jsworkers.cpp:918 (gdb) thread 24 [Switching to thread 24 (Thread 0xe0bfeb40 (LWP 3583))] #0 js::jit::AllocationIntegrityState::checkIntegrity (this=this@entry=0xe0bfde3c, block=0xd06f44d0, ins=<optimized out>, vreg=121, alloc=..., populateSafepoints=false) at js/src/jit/RegisterAllocator.cpp:177 177 AllocationIntegrityState::checkIntegrity(LBlock *block, LInstruction *ins, (gdb) back #0 js::jit::AllocationIntegrityState::checkIntegrity (this=this@entry=0xe0bfde3c, block=0xd06f44d0, ins=<optimized out>, vreg=121, alloc=..., populateSafepoints=false) at js/src/jit/RegisterAllocator.cpp:177 #1 0xf4ec0c62 in js::jit::AllocationIntegrityState::check (this=0xe0bfde3c, populateSafepoints=false) at js/src/jit/RegisterAllocator.cpp:167 #2 0xf4d9e66a in js::jit::GenerateLIR (mir=0xd0980078) at js/src/jit/Ion.cpp:1395 #3 0xf4d2aec6 in js::WorkerThread::handleAsmJSWorkload (this=this@entry=0xe1403400, state=...) at js/src/jsworkers.cpp:648 #4 0xf4d2c3be in js::WorkerThread::threadLoop (this=0xe1403400) at js/src/jsworkers.cpp:918 It looks like one other thread is in js::CompileAsmJS, and three workers are compiling the MIR. Here's the associated main thread in this case, also in js::CompileAsmJS: #5 0xf4d42a52 in AsmJSFaultHandler (context=0xffd9020c, info=0xffd9018c, signum=11) at js/src/jit/AsmJSSignalHandlers.cpp:988 #6 AsmJSFaultHandler (signum=11, info=0xffd9018c, context=0xffd9020c) at js/src/jit/AsmJSSignalHandlers.cpp:970 #7 <signal handler called> #8 0xf500eee5 in CheckFunctionsParallelImpl (m=..., group=...) at js/src/jit/AsmJS.cpp:5204 #9 0xf500f1d9 in CheckFunctionsParallel (m=...) at js/src/jit/AsmJS.cpp:5272 #10 CheckModule (cx=cx@entry=0xdc2d81e0, parser=..., stmtList=stmtList@entry=0xd788d2e8, module=module@entry=0xffd90f68, compilationTimeReport=compilationTimeReport@entry=0xffd90f60) at js/src/jit/AsmJS.cpp:6397 #11 0xf50112c6 in js::CompileAsmJS (cx=0xdc2d81e0, parser=..., stmtList=0xd788d2e8, validated=0xffd90fdc) at js/src/jit/AsmJS.cpp:6484 #12 0xf4fabc4d in js::frontend::Parser<js::frontend::FullParseHandler>::asmJS (this=0xffd92744, list=0xd788d2e8) at js/src/frontend/Parser.cpp:2534 (gdb) print group.state.asmJSFinishedList $1 = {<mozilla::VectorBase<js::AsmJSParallelTask*, 0u, js::SystemAllocPolicy, js::Vector<js::AsmJSParallelTask*, 0u, js::SystemAllocPolicy> >> = {<js::SystemAllocPolicy> = {<No data fields>}, static sElemIsPod = true, static sMaxInlineBytes = 1024, static sInlineCapacity = 0, static sInlineBytes = 1, mBegin = 0xcfdfe260, mLength = 0, mCapacity = 8, mReserved = 5, storage = {u = {bytes = "", _ = 0}}, entered = false, static sMaxInlineStorage = <optimized out>}, <No data fields>}
Assignee | ||
Comment 3•11 years ago
|
||
Thanks Douglas, this really helps. I'll audit the parallel compilation path; there probably are some main-thread assumptions I missed.
Assignee | ||
Comment 4•11 years ago
|
||
Ok, well obviously the assertion group.state.asmJSWorklist.empty() is invalid since group.state is runtime-wide :)
Assignee | ||
Comment 5•11 years ago
|
||
Ok, yes, there are some rather flagrant race conditions here. I think the fix is just to move the asmJSFailedList and few other bits of asmJS state into the ParallelGroupState (one per CompileAsmJS invocation) out of the WorkerThreadState (one per runtime).
Assignee | ||
Comment 6•11 years ago
|
||
This fixes the bug for me in Citadel and in the included testcase. Perhaps you can try the patch Douglas? Later we can try to remove this "only one CompileAsmJS may compile in parallel at a time" restriction.
Comment 7•11 years ago
|
||
Comment on attachment 812864 [details] [diff] [review] fix-race Review of attachment 812864 [details] [diff] [review]: ----------------------------------------------------------------- Beats crashing. But removing the restriction shouldn't be difficult. I can whip something up tomorrow. ::: js/src/jit-test/tests/asm.js/testParallelCompile.js @@ +1,1 @@ > +load(libdir + "asm.js"); This testcase doesn't appear to fail on tip after a number of runs. Would bumping up some "i" increase the probability of an assertion triggering?
Attachment #812864 -
Flags: review?(sstangl) → review+
Assignee | ||
Comment 8•11 years ago
|
||
(In reply to Sean Stangl [:sstangl] from comment #7) > Beats crashing. But removing the restriction shouldn't be difficult. I can > whip something up tomorrow. Cool, thanks! While I was looking at the code, I noticed we now have 4 types of tasks now (ion, asm, parse, compress) and each has its own queue and separate path. I was wondering if this worker code couldn't be refactored to just have some WorkerTaskBase and then have the individual jobs do what they need to do from this. > This testcase doesn't appear to fail on tip after a number of runs. Would > bumping up some "i" increase the probability of an assertion triggering? Oh, you need to pass --args='--ion-parallel-compile=on' to jit_tests.py.
Assignee | ||
Comment 9•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/70fe2b31caf4
Reporter | ||
Comment 10•11 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #6) > This fixes the bug for me in Citadel and in the included testcase. Perhaps > you can try the patch Douglas? Later we can try to remove this "only one > CompileAsmJS may compile in parallel at a time" restriction. Have not seen any more crashes that could be attributed to this issue. Thank you for looking into it so quickly - it does help with the testing of other work to have this one eliminated.
Assignee | ||
Comment 11•11 years ago
|
||
Comment on attachment 812864 [details] [diff] [review] fix-race [Approval Request Comment] Bug caused by (feature/regressing bug #): 902506 User impact if declined: crash if load multiple asm.js at the same time Testing completed (on m-c, etc.): m-c Risk to taking this patch (and alternatives if risky): low
Attachment #812864 -
Flags: approval-mozilla-aurora?
Comment 12•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/70fe2b31caf4
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Updated•11 years ago
|
Attachment #812864 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 13•11 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/0fefe5276534
status-firefox26:
--- → fixed
status-firefox27:
--- → fixed
Comment 14•11 years ago
|
||
verified per comment #10
You need to log in
before you can comment on or make changes to this bug.
Description
•