Closed Bug 922096 Opened 11 years ago Closed 11 years ago

Odinmonkey: assertion failure group.state.asmJSWorklist.empty(), at js/src/jit/AsmJS.cpp:5203

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla27
Tracking Status
firefox26 --- verified
firefox27 --- verified

People

(Reporter: dougc, Assigned: luke)

References

()

Details

Attachments

(1 file)

Seeing crashes due to assertion failures testing asm.js code
on a m-c debug build.

The crashes are not systematic, but can be reproduced with some
patience.  Starting three tabs, each loading the Citadel demo,
starting each about one second apart, gives a reasonable chance
of reproducing here.  The Citadel files are cached locally,
which might make a difference.

This example is on the x86, but the same or similar crash is
also encountered on a x64 build.

(gdb) back
#0  0xf772c430 in __kernel_vsyscall ()
#1  0xf7484f96 in nanosleep () from /lib/libc.so.6
#2  0xf7484dc3 in sleep () from /lib/libc.so.6
#3  0xf2c4f92e in ah_crap_handler (signum=11) at toolkit/xre/nsSigHandlers.cpp:88
#4  0xf2c56fc1 in nsProfileLock::FatalSignalHandler (signo=11, info=0xfff5c64c, context=0xfff5c6cc) at profile/dirserviceprovider/src/nsProfileLock.cpp:190
#5  0xf4cadd02 in AsmJSFaultHandler (context=0xfff5c6cc, info=0xfff5c64c, signum=11) at js/src/jit/AsmJSSignalHandlers.cpp:988
#6  AsmJSFaultHandler (signum=11, info=0xfff5c64c, context=0xfff5c6cc) at js/src/jit/AsmJSSignalHandlers.cpp:970
#7  <signal handler called>
#8  0xf4f7958c in CheckFunctionsParallelImpl (m=..., group=...) at js/src/jit/AsmJS.cpp:5203
#9  0xf4f79859 in CheckFunctionsParallel (m=...) at js/src/jit/AsmJS.cpp:5272
#10 CheckModule (cx=cx@entry=0xdc2c8540, parser=..., stmtList=stmtList@entry=0xdc9f22e8, module=module@entry=0xfff5d418, compilationTimeReport=compilationTimeReport@entry=0xfff5d410)
    at js/src/jit/AsmJS.cpp:6397
#11 0xf4f7b996 in js::CompileAsmJS (cx=0xdc2c8540, parser=..., stmtList=0xdc9f22e8, validated=0xfff5d48c) at js/src/jit/AsmJS.cpp:6484
#12 0xf4f1670d in js::frontend::Parser<js::frontend::FullParseHandler>::asmJS (this=0xfff5ebf4, list=0xdc9f22e8) at js/src/frontend/Parser.cpp:2534
#13 0xf4f29a21 in js::frontend::Parser<js::frontend::FullParseHandler>::maybeParseDirective (this=0xfff5ebf4, list=list@entry=0xdc9f22e8, pn=pn@entry=0xdc9f2010, cont=cont@entry=0xfff5d4e8)
...

(gdb) up
#8  0xf4f7958c in CheckFunctionsParallelImpl (m=..., group=...) at js/src/jit/AsmJS.cpp:5203
5203	    JS_ASSERT(group.state.asmJSWorklist.empty());

(gdb) list
5198	    if (!CheckAllFunctionsDefined(m))
5199	        return false;
5200	
5201	    JS_ASSERT(group.outstandingJobs == 0);
5202	    JS_ASSERT(group.compiledJobs == m.numFunctions());
5203	    JS_ASSERT(group.state.asmJSWorklist.empty());
5204	    JS_ASSERT(group.state.asmJSFinishedList.empty());
5205	    JS_ASSERT(!group.state.asmJSWorkerFailed());
5206	    return true;
5207	}

(gdb) print group.state.asmJSWorklist
$3 = {<mozilla::VectorBase<js::AsmJSParallelTask*, 0u, js::SystemAllocPolicy, js::Vector<js::AsmJSParallelTask*, 0u, js::SystemAllocPolicy> >> = {<js::SystemAllocPolicy> = {<No data fields>}, 
    static sElemIsPod = true, static sMaxInlineBytes = 1024, static sInlineCapacity = 0, static sInlineBytes = 1, mBegin = 0xd20dc8f0, mLength = 2, mCapacity = 4, mReserved = 4, storage = {u = {bytes = "", _ = 
    0}}, entered = false, static sMaxInlineStorage = <optimized out>}, <No data fields>}
Nice find!  Douglas: when this assert hits, is the base of the stack the main thread or an off-main-thread parse task?  Looking at the current state of all the threads, are there any other threads in asm.js compilation?

IIRC, we only do one off-main-thread parse at a time, but this could still happen concurrent with the main thread.  My guess is that off-main-thread parsing is causing some race condition on the shared parallel asmJS worker state that didn't exist when all parsing happened from the main thread.
(In reply to Luke Wagner [:luke] from comment #1)
> Nice find!  Douglas: when this assert hits, is the base of the stack the
> main thread or an off-main-thread parse task?

Yes, the crash occurred in the main thread.

> Looking at the current state
> of all the threads, are there any other threads in asm.js compilation?

The following threads appear relevant:

  24   Thread 0xe0bfeb40 (LWP 3583) "Analysis Helper" js::jit::AllocationIntegrityState::checkIntegrity (this=this@entry=0xe0bfde3c, block=0xd06f44d0, ins=<optimized out>, vreg=121, alloc=..., 
    populateSafepoints=false) at js/src/jit/RegisterAllocator.cpp:177
  23   Thread 0xe03fdb40 (LWP 3584) "Analysis Helper" 0xf4e4d0b4 in bitForValue (value=418) at js/src/jit/BitSet.h:36
  22   Thread 0xdfbfcb40 (LWP 3585) "Analysis Helper" 0xf4ebf2db in isMoveGroup (this=<optimized out>) at js/src/jit/LIR.h:691
  21   Thread 0xdf3fbb40 (LWP 3586) "Analysis Helper" 0xf77d4430 in __kernel_vsyscall ()
 
(gdb) thread 21
[Switching to thread 21 (Thread 0xdf3fbb40 (LWP 3586))]
#0  0xf77d4430 in __kernel_vsyscall ()
(gdb) back
#0  0xf77d4430 in __kernel_vsyscall ()
#1  0xf778c18c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#2  0xf734ff73 in PR_WaitCondVar (cvar=0xe1448980, timeout=4294967295) at nsprpub/pr/src/pthreads/ptsynch.c:385
#3  0xf4d272cf in js::WorkerThreadState::wait (this=0xe1e5c530, which=js::WorkerThreadState::CONSUMER, millis=0) at js/src/jsworkers.cpp:435
#4  0xf50012b8 in GetFinishedCompilation (m=..., group=..., group=...) at js/src/jit/AsmJS.cpp:5117
#5  0xf500ea57 in GenerateCodeForFinishedJob (outTask=<synthetic pointer>, group=..., m=...) at js/src/jit/AsmJS.cpp:5127
#6  CheckFunctionsParallelImpl (m=..., group=...) at js/src/jit/AsmJS.cpp:5174
#7  0xf500f1d9 in CheckFunctionsParallel (m=...) at js/src/jit/AsmJS.cpp:5272
#8  CheckModule (cx=cx@entry=0xd15ae220, parser=..., stmtList=stmtList@entry=0xd153d2e8, module=module@entry=0xdf3f9488, compilationTimeReport=compilationTimeReport@entry=0xdf3f9480)
    at js/src/jit/AsmJS.cpp:6397
#9  0xf50112c6 in js::CompileAsmJS (cx=0xd15ae220, parser=..., stmtList=0xd153d2e8, validated=0xdf3f94fc) at js/src/jit/AsmJS.cpp:6484

(gdb) thread 22
[Switching to thread 22 (Thread 0xdfbfcb40 (LWP 3585))]
#0  0xf4ebf2db in isMoveGroup (this=<optimized out>) at js/src/jit/LIR.h:691
691	    LIR_OPCODE_LIST(LIROP)
(gdb) back
#0  0xf4ebf2db in isMoveGroup (this=<optimized out>) at js/src/jit/LIR.h:691
#1  js::jit::AllocationIntegrityState::checkIntegrity (this=this@entry=0xdfbfbe3c, block=0xd0f99820, ins=<optimized out>, vreg=393, alloc=..., populateSafepoints=false)
    at js/src/jit/RegisterAllocator.cpp:186
#2  0xf4ec0c62 in js::jit::AllocationIntegrityState::check (this=0xdfbfbe3c, populateSafepoints=false) at js/src/jit/RegisterAllocator.cpp:167
#3  0xf4d9e66a in js::jit::GenerateLIR (mir=0xd0992078) at js/src/jit/Ion.cpp:1395
#4  0xf4d2aec6 in js::WorkerThread::handleAsmJSWorkload (this=this@entry=0xe1403560, state=...) at js/src/jsworkers.cpp:648
#5  0xf4d2c3be in js::WorkerThread::threadLoop (this=0xe1403560) at js/src/jsworkers.cpp:918

(gdb) thread 23
[Switching to thread 23 (Thread 0xe03fdb40 (LWP 3584))]
#0  0xf4e4d0b4 in bitForValue (value=418) at js/src/jit/BitSet.h:36
36	        return 1l << (uint32_t)(value % (8 * sizeof(uint32_t)));
(gdb) back
#0  0xf4e4d0b4 in bitForValue (value=418) at js/src/jit/BitSet.h:36
#1  contains (value=418, this=0xcbc66928) at js/src/jit/BitSet.h:63
#2  js::jit::BitSet::Iterator::operator++ (this=0xe03fcad4, dummy=0) at js/src/jit/BitSet.h:161
#3  0xf4e55088 in js::jit::LiveRangeAllocator<js::jit::LinearScanVirtualRegister>::buildLivenessInfo (this=0xe03fcbac) at js/src/jit/LiveRangeAllocator.cpp:758
#4  0xf4e464aa in js::jit::LinearScanAllocator::go (this=0xe03fcbac) at js/src/jit/LinearScan.cpp:1203
#5  0xf4d9e4a9 in js::jit::GenerateLIR (mir=0xd095b078) at js/src/jit/Ion.cpp:1391
#6  0xf4d2aec6 in js::WorkerThread::handleAsmJSWorkload (this=this@entry=0xe14034b0, state=...) at js/src/jsworkers.cpp:648
#7  0xf4d2c3be in js::WorkerThread::threadLoop (this=0xe14034b0) at js/src/jsworkers.cpp:918

(gdb) thread 24
[Switching to thread 24 (Thread 0xe0bfeb40 (LWP 3583))]
#0  js::jit::AllocationIntegrityState::checkIntegrity (this=this@entry=0xe0bfde3c, block=0xd06f44d0, ins=<optimized out>, vreg=121, alloc=..., populateSafepoints=false)
    at js/src/jit/RegisterAllocator.cpp:177
177	AllocationIntegrityState::checkIntegrity(LBlock *block, LInstruction *ins,
(gdb) back
#0  js::jit::AllocationIntegrityState::checkIntegrity (this=this@entry=0xe0bfde3c, block=0xd06f44d0, ins=<optimized out>, vreg=121, alloc=..., populateSafepoints=false)
    at js/src/jit/RegisterAllocator.cpp:177
#1  0xf4ec0c62 in js::jit::AllocationIntegrityState::check (this=0xe0bfde3c, populateSafepoints=false) at js/src/jit/RegisterAllocator.cpp:167
#2  0xf4d9e66a in js::jit::GenerateLIR (mir=0xd0980078) at js/src/jit/Ion.cpp:1395
#3  0xf4d2aec6 in js::WorkerThread::handleAsmJSWorkload (this=this@entry=0xe1403400, state=...) at js/src/jsworkers.cpp:648
#4  0xf4d2c3be in js::WorkerThread::threadLoop (this=0xe1403400) at js/src/jsworkers.cpp:918


It looks like one other thread is in js::CompileAsmJS, and three workers
are compiling the MIR.  Here's the associated main thread in this case,
also in js::CompileAsmJS:

#5  0xf4d42a52 in AsmJSFaultHandler (context=0xffd9020c, info=0xffd9018c, signum=11) at js/src/jit/AsmJSSignalHandlers.cpp:988
#6  AsmJSFaultHandler (signum=11, info=0xffd9018c, context=0xffd9020c) at js/src/jit/AsmJSSignalHandlers.cpp:970
#7  <signal handler called>
#8  0xf500eee5 in CheckFunctionsParallelImpl (m=..., group=...) at js/src/jit/AsmJS.cpp:5204
#9  0xf500f1d9 in CheckFunctionsParallel (m=...) at js/src/jit/AsmJS.cpp:5272
#10 CheckModule (cx=cx@entry=0xdc2d81e0, parser=..., stmtList=stmtList@entry=0xd788d2e8, module=module@entry=0xffd90f68, compilationTimeReport=compilationTimeReport@entry=0xffd90f60)
    at js/src/jit/AsmJS.cpp:6397
#11 0xf50112c6 in js::CompileAsmJS (cx=0xdc2d81e0, parser=..., stmtList=0xd788d2e8, validated=0xffd90fdc) at js/src/jit/AsmJS.cpp:6484
#12 0xf4fabc4d in js::frontend::Parser<js::frontend::FullParseHandler>::asmJS (this=0xffd92744, list=0xd788d2e8) at js/src/frontend/Parser.cpp:2534

(gdb) print group.state.asmJSFinishedList
$1 = {<mozilla::VectorBase<js::AsmJSParallelTask*, 0u, js::SystemAllocPolicy, js::Vector<js::AsmJSParallelTask*, 0u, js::SystemAllocPolicy> >> = {<js::SystemAllocPolicy> = {<No data fields>}, 
    static sElemIsPod = true, static sMaxInlineBytes = 1024, static sInlineCapacity = 0, static sInlineBytes = 1, mBegin = 0xcfdfe260, mLength = 0, mCapacity = 8, mReserved = 5, storage = {u = {bytes = "", _ = 
    0}}, entered = false, static sMaxInlineStorage = <optimized out>}, <No data fields>}
Thanks Douglas, this really helps.  I'll audit the parallel compilation path; there probably are some main-thread assumptions I missed.
Ok, well obviously the assertion group.state.asmJSWorklist.empty() is invalid since group.state is runtime-wide :)
Ok, yes, there are some rather flagrant race conditions here.  I think the fix is just to move the asmJSFailedList and few other bits of asmJS state into the ParallelGroupState (one per CompileAsmJS invocation) out of the WorkerThreadState (one per runtime).
Attached patch fix-raceSplinter Review
This fixes the bug for me in Citadel and in the included testcase.  Perhaps you can try the patch Douglas?  Later we can try to remove this "only one CompileAsmJS may compile in parallel at a time" restriction.
Assignee: nobody → luke
Status: NEW → ASSIGNED
Attachment #812864 - Flags: review?(sstangl)
Comment on attachment 812864 [details] [diff] [review]
fix-race

Review of attachment 812864 [details] [diff] [review]:
-----------------------------------------------------------------

Beats crashing. But removing the restriction shouldn't be difficult. I can whip something up tomorrow.

::: js/src/jit-test/tests/asm.js/testParallelCompile.js
@@ +1,1 @@
> +load(libdir + "asm.js");

This testcase doesn't appear to fail on tip after a number of runs. Would bumping up some "i" increase the probability of an assertion triggering?
Attachment #812864 - Flags: review?(sstangl) → review+
(In reply to Sean Stangl [:sstangl] from comment #7)
> Beats crashing. But removing the restriction shouldn't be difficult. I can
> whip something up tomorrow.

Cool, thanks!  While I was looking at the code, I noticed we now have 4 types of tasks now (ion, asm, parse, compress) and each has its own queue and separate path.  I was wondering if this worker code couldn't be refactored to just have some WorkerTaskBase and then have the individual jobs do what they need to do from this.

> This testcase doesn't appear to fail on tip after a number of runs. Would
> bumping up some "i" increase the probability of an assertion triggering?

Oh, you need to pass --args='--ion-parallel-compile=on' to jit_tests.py.
(In reply to Luke Wagner [:luke] from comment #6)
> This fixes the bug for me in Citadel and in the included testcase.  Perhaps
> you can try the patch Douglas?  Later we can try to remove this "only one
> CompileAsmJS may compile in parallel at a time" restriction.

Have not seen any more crashes that could be attributed to this issue. Thank you for looking into it so quickly - it does help with the testing of other work to have this one eliminated.
Comment on attachment 812864 [details] [diff] [review]
fix-race

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 902506
User impact if declined: crash if load multiple asm.js at the same time
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): low
Attachment #812864 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/70fe2b31caf4
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Attachment #812864 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
verified per comment #10
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: