Closed Bug 932143 Opened 11 years ago Closed 11 years ago

b2g x64 crash in js::jit::Assembler::TraceJumpRelocations during non-self.close() DOMWorker shutdown under GDB

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: asuth, Unassigned)

References

Details

(Keywords: crash, sec-other, Whiteboard: [b2g-crash])

While running the Firefox OS Gaia e-mail app unit tests using b2g-desktop I experience a deterministic (aka it reproduces 100% of the time for me in our first test) crash while our DOMWorker shuts down by calling JS_DestroyContext.  The crash goes away if the DOMWorker closes itself by calling "self.close()".  The crash still happens if I have its owning page call worker.teminate(), even if I use a setTimeout to ensure that the owning page stays alive until after the worker should have died.  (Originally the worker would not be explicitly closed and would die as a result of its parent page dying with the same crashy phenomenon.)  Hollowing out the unit test so that basically only the test framework is active still causes the crash.

It's worth noting that this happens for me with both Mozilla built nightly b2g-desktop (--enable-application=b2g) builds, locally built gcc-4.7.3 builds, and locally built gcc-4.8.1 builds.  It did *not* crash for me with a locally built gcc-4.8.1 build with "--enable-profiling" and "--enable-optimize=-O1".  That could suggest compiler bugs or some sort of timing dependent thing.

The activities of the main thread or other threads do not seem to be involved.  In all crashes, all threads but the main thread are in standard wait-states.  In the cases where the main thread did not wait around for the worker to die, it would be doing something JSy.  When I added the timeout, the main thread was in an idle state waiting for its timeout to fire so it could proceed with cleanup/unit tests.

The hg hash I'm building against is a80dce1126db, corresponding to b2g x64 nightly BuildID=20131027040200.


I realize the DOMWorker dependency of all this suggests this might actually be a DOMWorker problem, but I figured I would file this here first since I would expect a GC hazard segfault involving workers to look more blatant.  Please feel free to turf as appropriate.

Please let me know if you'd like me to run under a -DDEBUG build or collect allocation logs, etc.  Thanks!

== Segfault:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2aaad8b00700 (LWP 32189)]
js::jit::Assembler::TraceJumpRelocations (trc=trc@entry=0x2aaabd0c92e8, code=code@entry=0x2aaad9e3a850, reader=...) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jit/x64/Assembler-x64.cpp:252
252	    while (iter.read()) {

#0  js::jit::Assembler::TraceJumpRelocations (trc=trc@entry=0x2aaabd0c92e8, code=code@entry=0x2aaad9e3a850, reader=...) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jit/x64/Assembler-x64.cpp:252
#1  0x00002aaaadf3e193 in js::jit::IonCode::trace (this=this@entry=0x2aaad9e3a850, trc=trc@entry=0x2aaabd0c92e8) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jit/Ion.cpp:654
#2  0x00002aaaadd28ef3 in MarkChildren (code=0x2aaad9e3a850, trc=0x2aaabd0c92e8) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/gc/Marking.cpp:1194
#3  js::GCMarker::processMarkStackOther (this=this@entry=0x2aaabd0c92e8, budget=..., tag=<optimized out>, addr=46913288382544) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/gc/Marking.cpp:1365
#4  0x00002aaaadd29590 in processMarkStackTop (budget=..., this=0x2aaabd0c92e8) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/gc/Marking.cpp:1432
#5  js::GCMarker::drainMarkStack (this=this@entry=0x2aaabd0c92e8, budget=...) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/gc/Marking.cpp:1533
#6  0x00002aaaadd8e3a7 in MarkGrayReferences<js::gc::GCZoneGroupIter, js::CompartmentsIterT<js::gc::GCZoneGroupIter> > (rt=0x2aaabd0c9000) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:3012
#7  MarkGrayReferencesInCurrentGroup (rt=0x2aaabd0c9000) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:3025
#8  EndMarkingZoneGroup (rt=rt@entry=0x2aaabd0c9000) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:3680
#9  0x00002aaaadd97aa9 in BeginSweepPhase (lastGC=false, rt=0x2aaabd0c9000) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:3850
#10 IncrementalCollectSlice (rt=rt@entry=0x2aaabd0c9000, budget=budget@entry=0, reason=reason@entry=JS::gcreason::DESTROY_CONTEXT, gckind=gckind@entry=js::GC_NORMAL) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:4410
#11 0x00002aaaadd98c68 in GCCycle (rt=rt@entry=0x2aaabd0c9000, incremental=incremental@entry=false, budget=budget@entry=0, gckind=gckind@entry=js::GC_NORMAL, reason=reason@entry=JS::gcreason::DESTROY_CONTEXT) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:4555
#12 0x00002aaaadd990f5 in Collect (rt=rt@entry=0x2aaabd0c9000, incremental=incremental@entry=false, budget=budget@entry=0, gckind=gckind@entry=js::GC_NORMAL, reason=reason@entry=JS::gcreason::DESTROY_CONTEXT) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:4699
#13 0x00002aaaadd9939d in Collect (reason=JS::gcreason::DESTROY_CONTEXT, gckind=js::GC_NORMAL, budget=0, incremental=false, rt=rt@entry=0x2aaabd0c9000) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:4725
#14 js::GC (rt=rt@entry=0x2aaabd0c9108, gckind=gckind@entry=js::GC_NORMAL, reason=reason@entry=JS::gcreason::DESTROY_CONTEXT) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsgc.cpp:4724
#15 0x00002aaaadd65f37 in js::DestroyContext (cx=0x2aaad7fefc50, mode=(unknown: 3202783480)) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jscntxt.cpp:267
#16 0x00002aaaadd3931a in JS_DestroyContext (cx=<optimized out>) at /home/visbrero/rev_control/fgit/mozilla-central/js/src/jsapi.cpp:858
#17 0x00002aaaacfbc493 in (anonymous namespace)::WorkerThreadRunnable::Run (this=<optimized out>) at /home/visbrero/rev_control/fgit/mozilla-central/dom/workers/RuntimeService.cpp:946
#18 0x00002aaaad7346f0 in nsThread::ProcessNextEvent (this=0x2aaad79ec6a0, mayWait=<optimized out>, result=0x2aaad8affe2f) at /home/visbrero/rev_control/fgit/mozilla-central/xpcom/threads/nsThread.cpp:622
#19 0x00002aaaad7064bf in NS_ProcessNextEvent (thread=<optimized out>, thread@entry=0x2aaad79ec6a0, mayWait=mayWait@entry=true) at /home/visbrero/rev_control/fgit/mozilla-central/xpcom/glue/nsThreadUtils.cpp:251
#20 0x00002aaaad734d99 in nsThread::ThreadFunc (arg=0x2aaad79ec6a0) at /home/visbrero/rev_control/fgit/mozilla-central/xpcom/threads/nsThread.cpp:250
#21 0x00002aaaabf92cc2 in _pt_root (arg=0x2aaabee697a0) at /home/visbrero/rev_control/fgit/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:204
#22 0x00002aaaaacd7f6e in start_thread (arg=0x2aaad8b00700) at pthread_create.c:311
#23 0x00002aaaab7f39cd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Keywords: crash
Whiteboard: [b2g-crash]
See Bug 914511.
Group: core-security
(In reply to Nicolas B. Pierron [:nbp] from comment #1)
> See Bug 914511.

Can someone cc me?  I've got MoCo bits and QC bits, but no security group bits.
Depends on: 914511
This is the exact same signature jandem and I were seeing yesterday.  The source of the problem was that, in xpcshell, breakpad is installing handlers after the JSRuntime had installed its signal handlers (bug 931861).  With bug 864220, these handlers are necessary to handle the intentional SIGSEGV caused by the watchdog thread, which mprotects all the Ion code (including the bit touched by TraceJumpRelocations).

One way to test this is to put:
    struct sigaction cur;
    sigaction(SIGSEGV, nullptr, &cur);
    JS_ASSERT(cur.sa_sigaction == &AsmJSFaultHandler);
at the head of js::TriggerOperationCallbackForAsmJSCode in js/src/jit/AsmJSSignalHandlers.cpp and see if it hits.
See also bug 914511 for another bug with the same signature. Unfortunately only reproduces on Gary's machine :(
I don't think this is security sensitive: it's a reliable crash; no corrupted memory our unintended reads/writes or anything.
(In reply to Luke Wagner [:luke] from comment #6)
> This is the exact same signature jandem and I were seeing yesterday.  The
> source of the problem was that, in xpcshell, breakpad is installing handlers
> after the JSRuntime had installed its signal handlers (bug 931861).  With
> bug 864220, these handlers are necessary to handle the intentional SIGSEGV
> caused by the watchdog thread, which mprotects all the Ion code (including
> the bit touched by TraceJumpRelocations).

When we protect the memory, we prevent reading it also?  I thought that we only flipped the execution bits of it to prevent the execution of any code.
Nope, PROT_NONE.
Okay, so it looks like Luke's comment 6 about signal handlers is dead on.  This appears to be gdb 7.6.1-ubuntu from Ubuntu 13.10 seeing the signals that it should be ignoring or not being told about.  I am able to "cont"inue when I see this error and then everything starts working again.

Applying the hunk in comment 6 does not seem to have any affect on this, presumably because the code is unable to defeat what gdb is doing.


So super specifically
- Runing under gdb, I see this sigsegv at the conclusion of the first unit test in the batch when the first worker is getting shut down.  Cont works and keeps working until I get to the next crash:

- Do not run under gdb, see the crash occur at a much later time, specifically the time that corresponds to that other crash I saw, which is bug 932162.  (And if I do self.close() I manage to avoid the gdb-crasher, allowing me to get to the real crasher I cared about, bug 932162.)


So I think what happened is I ran into the bug 932162 crasher, then I decided to run things under gdb.  And then I got this gdb psych-out.

I feel like I had seen a blog-post on planet mozilla some time ago about gdb getting tricked by the JIT, but then something about it getting fixed.  Unfortunately, although I did google a lot, I had real trouble finding the blog post and things like https://developer.mozilla.org/en-US/docs/SpiderMonkey/Hacking_Tips don't seem to make mention of this.  (In the olden days, I would have used google reader to search my starred posts, but I just have newsblur now and none of my historical starred posts readily accessible.)

In any event, my apologies for any noise I generated, though if we can do something to shut gdb up about this automagically, I think it would be great, because this is not the last time you will hear something like this, I fear...
Summary: b2g x64 crash in js::jit::Assembler::TraceJumpRelocations during non-self.close() DOMWorker shutdown → b2g x64 crash in js::jit::Assembler::TraceJumpRelocations during non-self.close() DOMWorker shutdown under GDB
(In reply to Andrew Sutherland (:asuth) from comment #11)
> I feel like I had seen a blog-post on planet mozilla some time ago about gdb
> getting tricked by the JIT, but then something about it getting fixed. 
> Unfortunately, although I did google a lot, I had real trouble finding the
> blog post and things like
> https://developer.mozilla.org/en-US/docs/SpiderMonkey/Hacking_Tips don't
> seem to make mention of this.  (In the olden days, I would have used google
> reader to search my starred posts, but I just have newsblur now and none of
> my historical starred posts readily accessible.)

See this commit and bug 906860:

https://hg.mozilla.org/integration/mozilla-inbound/rev/5f764ef54601
This was addressed by bug 933807 which added an environment variable to make the fake-out go away and "mach" automation to automatically set that environment variable for the good of all gdb users everywhere.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Group: core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.