crash in js::CompartmentChecker::check(JSObject*)

VERIFIED FIXED in Firefox 26

Status

()

defect
--
critical
VERIFIED FIXED
6 years ago
6 years ago

People

(Reporter: azakai, Assigned: bbouvier)

Tracking

({crash, regression})

unspecified
mozilla27
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox26 verified, firefox27 verified)

Details

(crash signature)

Attachments

(1 attachment, 1 obsolete attachment)

This bug was filed from the Socorro interface and is 
report bp-ba67ca30-9cde-4de7-8f3c-d3bfb2130921.
=============================================================

I get this crash signature consistently as follows:

1. Load http://kripken.github.io/boon/inception/
2. Press 'play demo'
3. Start the game (press high resolution)
4. Wait a 5-10 seconds until it crashes
I see this in Aurora as well. It did not happen in Aurora 9-16, but on 9-20 (after the latest merge) the problem does show up.

I sometimes see a hang of the browser instead of a crash. To investigate that, I tried to run the browser in gdb. But then nightly and aurora both segfault in gdb during startup of the demo, which is earlier than this bug happens. Stack trace I see in that case is

#0  0x00007ffff37d8def in js::jit::CompactBufferReader::readFixedUint32_t() () from /home/alon/Downloads/firefox/libxul.so
#1  0x00007ffff37d9289 in js::jit::Assembler::TraceJumpRelocations(JSTracer*, js::jit::IonCode*, js::jit::CompactBufferReader&) ()
   from /home/alon/Downloads/firefox/libxul.so
#2  0x00007ffff375aeed in js::jit::IonCode::trace(JSTracer*) () from /home/alon/Downloads/firefox/libxul.so
#3  0x00007ffff36207df in js::GCMarker::processMarkStackOther(js::SliceBudget&, unsigned long, unsigned long) ()
   from /home/alon/Downloads/firefox/libxul.so
#4  0x00007ffff36213fe in js::GCMarker::drainMarkStack(js::SliceBudget&) () from /home/alon/Downloads/firefox/libxul.so
#5  0x00007ffff36bb7bd in IncrementalCollectSlice(JSRuntime*, long, JS::gcreason::Reason, js::JSGCInvocationKind) ()
   from /home/alon/Downloads/firefox/libxul.so
#6  0x00007ffff36bc95d in GCCycle(JSRuntime*, bool, long, js::JSGCInvocationKind, JS::gcreason::Reason) ()
   from /home/alon/Downloads/firefox/libxul.so
#7  0x00007ffff36bcc63 in Collect(JSRuntime*, bool, long, js::JSGCInvocationKind, JS::gcreason::Reason) ()
   from /home/alon/Downloads/firefox/libxul.so
#8  0x00007ffff27f5545 in mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal(JSContext*, bool, bool) ()
   from /home/alon/Downloads/firefox/libxul.so
#9  0x00007ffff27f55c7 in (anonymous namespace)::GarbageCollectRunnable::WorkerRun(JSContext*, mozilla::dom::workers::WorkerPrivate*) ()
   from /home/alon/Downloads/firefox/libxul.so
#10 0x00007ffff27f08f0 in mozilla::dom::workers::WorkerRunnable::Run() () from /home/alon/Downloads/firefox/libxul.so
#11 0x00007ffff27f7c3b in mozilla::dom::workers::WorkerPrivate::DoRunLoop(JSContext*) () from /home/alon/Downloads/firefox/libxul.so
#12 0x00007ffff27e660e in (anonymous namespace)::WorkerThreadRunnable::Run() () from /home/alon/Downloads/firefox/libxul.so
#13 0x00007ffff302e76b in nsThread::ProcessNextEvent(bool, bool*) () from /home/alon/Downloads/firefox/libxul.so
#14 0x00007ffff2ff2aae in NS_ProcessNextEvent(nsIThread*, bool) () from /home/alon/Downloads/firefox/libxul.so
#15 0x00007ffff302f4f0 in nsThread::ThreadFunc(void*) () from /home/alon/Downloads/firefox/libxul.so
#16 0x00007ffff67edbac in _pt_root () from /home/alon/Downloads/firefox/libnspr4.so
#17 0x00007ffff7bc4e9a in start_thread (arg=0x7fff9efff700) at pthread_create.c:308
#18 0x00007ffff6cd0ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:112
#19 0x0000000000000000 in ?? ()
Keywords: regression
I bisected on nightlies. The first bad nightly is 9-12 which is last week.
Alon, by any chance do you have the cset range for the 9-12 nightly?

The segfaults in gdb during startup are, I believe, innocuous, just caused by the operation callback.

I can reproduce a problem on http://kripken.github.io/boon/inception/ every time, but it's a hang, not a crash.  Once the process is hung, breaking in gdb shows the main thread always with the stack:

#0  0x00007f0b9b2b5ca4 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x00007f0b9a0e45f0 in PR_WaitCondVar (cvar=0x7f0b79045d40, timeout=4294967295) at /moz/mi/nsprpub/pr/src/pthreads/ptsynch.c:385
#2  0x00007f0b972efcfd in wait (this=0x7f0b7a0e7150, which=<optimized out>, millis=<optimized out>) at /moz/mi/js/src/jsworkers.cpp:435
#3  js::AutoPauseWorkersForGC::AutoPauseWorkersForGC (this=0x7fff3e8150b0, rt=<optimized out>) at /moz/mi/js/src/jsworkers.cpp:960
#4  0x00007f0b97252224 in AutoTraceSession (heapState=js::MajorCollecting, rt=0x7f0b8629c000, this=0x7fff3e8150a0) at /moz/mi/js/src/jsgc.cpp:4067
#5  AutoGCSession (rt=0x7f0b8629c000, this=0x7fff3e8150a0) at /moz/mi/js/src/jsgc.cpp:4082
#6  GCCycle (rt=rt@entry=0x7f0b8629c000, incremental=incremental@entry=true, budget=budget@entry=10000, gckind=gckind@entry=js::GC_NORMAL, reason=reason@entry=JS::gcreason::TOO_MUCH_MALLOC)
    at /moz/mi/js/src/jsgc.cpp:4484
#7  0x00007f0b97252830 in Collect (rt=rt@entry=0x7f0b8629c000, incremental=incremental@entry=true, budget=10000, gckind=gckind@entry=js::GC_NORMAL, reason=JS::gcreason::TOO_MUCH_MALLOC)
    at /moz/mi/js/src/jsgc.cpp:4648
#8  0x00007f0b97252b3f in js::GCSlice (rt=rt@entry=0x7f0b8629c000, gckind=gckind@entry=js::GC_NORMAL, reason=<optimized out>, millis=millis@entry=0) at /moz/mi/js/src/jsgc.cpp:4685
#9  0x00007f0b9721b062 in js_InvokeOperationCallback (cx=0x7f0b67199df0) at /moz/mi/js/src/jscntxt.cpp:993
#10 js_HandleExecutionInterrupt (cx=0x7f0b67199df0) at /moz/mi/js/src/jscntxt.cpp:1021

and one analysis thread is always parked at:

#0  0x00007f0b973b871d in js::jit::LIRGenerator::visitToInt32 (this=0x7f0b6f7eaf10, convert=0x7f0b58d5a7c8) at /moz/mi/js/src/jit/Lowering.cpp:1640
#1  0x00007f0b973b10c1 in js::jit::LIRGenerator::visitInstruction (this=0x7f0b6f7eaf10, ins=0x7f0b58d5a7c8) at /moz/mi/js/src/jit/Lowering.cpp:3175
#2  0x00007f0b973b1348 in js::jit::LIRGenerator::visitBlock (this=0x7f0b6f7eaf10, block=0x7f0b594b5c10) at /moz/mi/js/src/jit/Lowering.cpp:3267
#3  0x00007f0b973bcaf3 in js::jit::LIRGenerator::generate (this=this@entry=0x7f0b6f7eaf10) at /moz/mi/js/src/jit/Lowering.cpp:3343
#4  0x00007f0b9733c0dd in js::jit::GenerateLIR (mir=mir@entry=0x7f0a33fe10c8) at /moz/mi/js/src/jit/Ion.cpp:1410
#5  0x00007f0b9733cbcf in js::jit::CompileBackEnd (mir=0x7f0a33fe10c8, maybeMasm=maybeMasm@entry=0x0) at /moz/mi/js/src/jit/Ion.cpp:1513
#6  0x00007f0b972f0b01 in handleIonWorkload (state=..., this=0x7f0b7a0ebda8) at /moz/mi/js/src/jsworkers.cpp:697
#7  js::WorkerThread::threadLoop (this=0x7f0b7a0ebda8) at /moz/mi/js/src/jsworkers.cpp:920
#8  0x00007f0b9a0ea014 in _pt_root (arg=0x7f0b7a0e8690) at /moz/mi/nsprpub/pr/src/pthreads/ptthread.c:204
#9  0x00007f0b9b2b1f8e in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#10 0x00007f0b9a7cfe1d in clone () from /lib/x86_64-linux-gnu/libc.so.6

I see 200% CPU utilization, so I think this isn't just a simple deadlock.
Actually, all the threading stuff was a red herring, I think; when I disable parallel compilation/parsing, the problem still reproduces (now on the main thread).

Stepping through the instructions, I see an iloop where the switch(opd->type()) in LIRGenerator::visitToInt32 is jumping to right above the switch.  Printf'ing opd->type() shows MIRType_Float32 which is indeed absent from the switch, so I think this is just another missing case of float32.
I see a hang some of the time too. It either crashes or hangs.

Changeset range is http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=9e9f74116749&tochange=a4e9c9c9dbf9 It does have some commits mentioning float32 in there (but I don't think I see the main landing of float32).

Btw, how do I disable the operation callback from tripping the debugger?
(gdb) source $(PATH_TO_MOZ)/js/src/shell/js-gdb.gdb

This will install a catch handler that automatically continues of the handler is AsmJSFaultHandler.  I just tried and I get an error that the symbol __GI___sigaction isn't resolved, so I had to change js/src/shell/js-gdb.gdb to call __sigaction instead (filed bug 919564 to fix this), but then it worked.
I'll investigate on this today.
Status: NEW → ASSIGNED
Posted patch fix + test case (obsolete) — Splinter Review
ToInt32 can be called with a Float32 as an input, for instance on 'here is a string'[Math.fround(3)]. This patch is a workaround to convert the Float32 to a Double before it gets converted to an Integer.

That's a temporary solution, the longer term solution is in bug 919838.
Assignee: general → bbouvier
Attachment #808919 - Flags: review?(sstangl)
Bug 920114 is the same issue as this I think, and has the results of running in a debug build.
Duplicate of this bug: 920114
Comment on attachment 808919 [details] [diff] [review]
fix + test case

Review of attachment 808919 [details] [diff] [review]:
-----------------------------------------------------------------

The issue is caused by LIRGenerator::visitToInt32() not having a case for MIRType_Float32, which caused flow into MOZ_ASSUME_UNREACHABLE() instead of returning an error. This fix seems fine, but it's possible that we have missed some more MIR instructions that cannot handle Float32 inputs and that would behave similarly. There are ways to prevent this class of bug in the future, but it seems unlikely to recur, and debug builds should weed them out quickly.
Attachment #808919 - Flags: review?(sstangl) → review+
With this patch applied, I still abort, but with an assertion about "consumer->isConsistentFloat32Use(), at js/src/jit/IonAnalysis.cpp:922", as noted in bug 920114 comment 3.
https://hg.mozilla.org/mozilla-central/rev/94c5919f12c1
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
This bug reached Aurora, so I think we need to fix it there too.
Comment on attachment 808919 [details] [diff] [review]
fix + test case

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 888109
User impact if declined: crashes / infinite hangs on different web sites (e.g. on the URL mentioned above, but also on Google Maps, etc.)
Testing completed (on m-c, etc.): testing completed on m-c, all tests pass
Risk to taking this patch (and alternatives if risky): very low, if not no risk
String or IDL/UUID changes made by this patch: N/A
Attachment #808919 - Flags: approval-mozilla-aurora?
Carrying forward r+ from sstangl.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 888109
User impact if declined: crashes / infinite hangs on different web sites (e.g. on the URL mentioned above, but also on Google Maps, etc.)
Testing completed (on m-c, etc.): testing completed on m-c, all tests pass
Risk to taking this patch (and alternatives if risky): very low, if not no risk
String or IDL/UUID changes made by this patch: N/A
Attachment #808919 - Attachment is obsolete: true
Attachment #808919 - Flags: approval-mozilla-aurora?
Attachment #810056 - Flags: review+
Attachment #810056 - Flags: approval-mozilla-aurora?
I can still reproduce a hang (iloop in LIRGenerator::visitToInt32 where opd->type() == MIRType_Float32) on the demo with a fresh mozilla-inbound tip opt build on Linux64.  If I sit still it doesn't happen, I have to continually jump around.  Is there another place where float32 can vlot into visitToInt32?
Luke: Did you have the patch for bug 915903 applied, too? IIUC that bug could cause similar issues. (and its patch only made it to mozilla-central in the last 12 hours, so maybe you didn't have its patch yet?)  With that, I was able to play Boon for a few minutes at least.
Unfortunately, yes, I specifically checked to see bug 915903 applied.  It seems to take longer to hit the hang than it used to, so I suspect it is a different code path being compiled.
Thankfully, but strangely, the problem is fixed on the most recent Nightly.  So either there is a regression on tip or I was accidentally on a branch; probably the latter, given the time :)  I'll build again to confirm.
Yep, my mistake; works great on tip build.
This problem still happens for me on the very latest nightly. I do need to wait longer, 20-30 seconds or so, but then it hangs with full cpu use, just like before.
Arg, it seems to be non-deterministic, but yes, I was just able to repro it again.  Again, it's visitToToInt32 with opd->type() == MIRType_Float32.

If there we can get this fixed before the next Nightly, let's do that.  Otherwise, I think we need to turn off the float32 optimization for the time being.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I checked all call sites by hand and it shouldn't happen anymore. Anyways, this workaround is getting more properly implemented in bug 919838, which is about to land, when ARM support gets reviewed.

This will (hopefully) solve all Float32 ToInt32 coercions issues.
Just to be sure I tried and bug 919838 does fix the hang for me (three extended).  Hopefully we can land this before tonight's Nightly.
Nightly contains bug 919838, I tried the demo several times, no hang.
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Attachment #810056 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.