Closed Bug 1045291 Opened 10 years ago Closed 8 years ago

B2G: Segmentation fault around js/src/jit/CompactBuffer.h:57

Categories

(Core :: JavaScript Engine: JIT, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: gwagner, Unassigned)

References

Details

STR: start B2G on nexus 4 with --enable-debug gecko.

Program received signal SIGSEGV, Segmentation fault.
0xb5a74642 in readByte (this=<optimized out>) at ../../../js/src/jit/CompactBuffer.h:57
57	        return *buffer_++;
(gdb) bt
#0  0xb5a74642 in readByte (this=<optimized out>) at ../../../js/src/jit/CompactBuffer.h:57
#1  readVariableLength (this=<optimized out>) at ../../../js/src/jit/CompactBuffer.h:40
#2  readUnsigned (this=<optimized out>) at ../../../js/src/jit/CompactBuffer.h:72
#3  read (this=<synthetic pointer>) at ../../../js/src/jit/arm/Assembler-arm.cpp:625
#4  js::jit::Assembler::TraceJumpRelocations (trc=0xaffe5908, code=0xaedec1a0, reader=<optimized out>)
    at ../../../js/src/jit/arm/Assembler-arm.cpp:788
#5  0xb5974f4a in js::jit::JitCode::trace (this=0xaedec1a0, trc=0xaffe5908) at ../../../js/src/jit/Ion.cpp:747
#6  0xb58f33f0 in MarkChildren (code=0xaedec1a0, trc=0xaffe5908) at ../../../js/src/gc/Marking.cpp:1422
#7  js::GCMarker::processMarkStackOther (this=0xaffe5908, tag=<optimized out>, addr=2933834144)
    at ../../../js/src/gc/Marking.cpp:1597
#8  0xb58f8a1e in js::GCMarker::drainMarkStack (this=0xaffe5908, budget=...) at ../../../js/src/gc/Marking.cpp:1749
#9  0xb5ab8836 in js::gc::GCRuntime::drainMarkStack (this=0xaffe51d8, sliceBudget=..., phase=<optimized out>)
    at ../../../js/src/jsgc.cpp:4331
#10 0xb5aec138 in js::gc::GCRuntime::incrementalCollectSlice (this=0xaffe51d8, budget=<optimized out>, 
    reason=JS::gcreason::DOM_WORKER, gckind=js::GC_SHRINK) at ../../../js/src/jsgc.cpp:4839
#11 0xb5aec856 in js::gc::GCRuntime::gcCycle (this=0xaffe51d8, incremental=<optimized out>, budget=0, gckind=js::GC_SHRINK, 
    reason=JS::gcreason::DOM_WORKER) at ../../../js/src/jsgc.cpp:5047
#12 0xb5aeca6a in js::gc::GCRuntime::collect (this=0xaffe51d8, incremental=<optimized out>, budget=0, gckind=js::GC_SHRINK, 
    reason=JS::gcreason::DOM_WORKER) at ../../../js/src/jsgc.cpp:5174
#13 0xb5aef2c8 in GC (reason=<optimized out>, gckind=js::GC_SHRINK, rt=<optimized out>) at ../../../js/src/jsgc.cpp:5202
#14 JS::ShrinkingGC (rt=<optimized out>, reason=<optimized out>) at ../../../js/src/jsfriendapi.cpp:206
#15 0xb4ef92ee in mozilla::dom::workers::WorkerPrivate::GarbageCollectInternal (this=0xb13a0800, aCx=0xb0c95d20, 
    aShrinking=<optimized out>, aCollectChildren=<optimized out>) at ../../../dom/workers/WorkerPrivate.cpp:5613
#16 0xb4ef933c in (anonymous namespace)::GarbageCollectRunnable::WorkerRun (this=<optimized out>, aCx=<optimized out>, 
    aWorkerPrivate=<optimized out>) at ../../../dom/workers/WorkerPrivate.cpp:1681
#17 0xb4f01eac in mozilla::dom::workers::WorkerRunnable::Run (this=0xaff21cc0)
    at ../../../dom/workers/WorkerRunnable.cpp:312
#18 0xb4efd336 in mozilla::dom::workers::WorkerPrivate::ProcessAllControlRunnablesLocked (this=0xb13a0800)
    at ../../../dom/workers/WorkerPrivate.cpp:4500
#19 0xb4f00cb2 in mozilla::dom::workers::WorkerPrivate::DoRunLoop (this=0xb13a0800, aCx=0xb0c95d20)
    at ../../../dom/workers/WorkerPrivate.cpp:3996
#20 0xb4ee8380 in (anonymous namespace)::WorkerThreadPrimaryRunnable::Run (this=0xb2ad1e20)
    at ../../../dom/workers/RuntimeService.cpp:2737
#21 0xb44db402 in nsThread::ProcessNextEvent (this=0xb13363e0, aMayWait=<optimized out>, aResult=0xb0bbfe27)
    at ../../../xpcom/threads/nsThread.cpp:770
#22 0xb44ef34c in NS_ProcessNextEvent (aThread=0xb13363e0, aMayWait=<optimized out>)
    at /Volumes/2mac/moz/ib2g/xpcom/glue/nsThreadUtils.cpp:265
---Type <return> to continue, or q <return> to quit---
#23 0xb469f52e in mozilla::ipc::MessagePumpForNonMainThreads::Run (this=0xb0c39970, aDelegate=0xb1332de0)
    at ../../../ipc/glue/MessagePump.cpp:326
#24 0xb468d62e in MessageLoop::RunInternal (this=0xb1332de0) at ../../../ipc/chromium/src/base/message_loop.cc:229
#25 0xb468d646 in RunHandler (this=0xb1332de0) at ../../../ipc/chromium/src/base/message_loop.cc:222
#26 MessageLoop::Run (this=0xb1332de0) at ../../../ipc/chromium/src/base/message_loop.cc:196
#27 0xb44dc3ec in nsThread::ThreadFunc (aArg=0xb13363e0) at ../../../xpcom/threads/nsThread.cpp:347
#28 0xb68e6d58 in _pt_root (arg=0xb2b63180) at ../../../../../nsprpub/pr/src/pthreads/ptthread.c:212
#29 0xb6ea1a5c in __thread_entry (func=0xb68e6cb1 <_pt_root>, arg=0xb2b63180, tls=0xb0bbff00)
    at bionic/libc/bionic/pthread_create.cpp:92
#30 0xb6ea1bd8 in pthread_create (thread_out=0xbe957be4, attr=<optimized out>, start_routine=0x78, arg=0xb2b63180)
    at bionic/libc/bionic/pthread_create.cpp:201
Flags: needinfo?(jdemooij)
Gregor, what happens if you do a "continue" in gdb?

The reason I am asking is that this is a common stack trace that we obtain when we are mprotect-ing the code and reading pointers out of mprotect-ed code it during a GC. (luke?)

If you still hit this identical stack after the continue, then this might be a problem of the content which is read out of the code. (mjrosenb, dougc?)
Flags: needinfo?(jdemooij) → needinfo?(anygregor)
Yes segfaults under TraceJumpRelocations are usually expected (we use a signal-based mechanism to interrupt JIT code, unfortunately gdb catches them as well).

You can use the JS_NO_SIGNALS=1 (or JS_DISABLE_SLOW_SCRIPT_SIGNALS=1) environment variable to disable them.
(In reply to Nicolas B. Pierron [:nbp] from comment #1)
> The reason I am asking is that this is a common stack trace that we obtain
> when we are mprotect-ing the code and reading pointers out of mprotect-ed
> code it during a GC. (luke?)

nbp: for future reference, I'm not the one that added this mechanism for Ion (which is the case that people see everywhere); that'd be bhackett.
(In reply to Nicolas B. Pierron [:nbp] from comment #1)
> Gregor, what happens if you do a "continue" in gdb?

Continue works. It happens multiple times just during startup and it's very annoying if we have do debug anything. How can we fix this? Should we set JS_NO_SIGNALS=1 to default?
Flags: needinfo?(anygregor)
(In reply to Gregor Wagner [:gwagner] from comment #4)
> (In reply to Nicolas B. Pierron [:nbp] from comment #1)
> > Gregor, what happens if you do a "continue" in gdb?
> 
> Continue works. It happens multiple times just during startup and it's very
> annoying if we have do debug anything. How can we fix this? Should we set
> JS_NO_SIGNALS=1 to default?

JS_NO_SIGNALS is a work-around, not a solution.  bbouvier was working on a bug to get this handled properly on Gecko's part.  JS_NO_SIGNALS is like disabling the interruption callback, which is not a desired behaviour except when this is blocking you for reproducing a test case.
(In reply to Nicolas B. Pierron [:nbp] from comment #5)
> (In reply to Gregor Wagner [:gwagner] from comment #4)
> > (In reply to Nicolas B. Pierron [:nbp] from comment #1)
> > > Gregor, what happens if you do a "continue" in gdb?
> > 
> > Continue works. It happens multiple times just during startup and it's very
> > annoying if we have do debug anything. How can we fix this? Should we set
> > JS_NO_SIGNALS=1 to default?
> 
> JS_NO_SIGNALS is a work-around, not a solution.  bbouvier was working on a
> bug to get this handled properly on Gecko's part.  JS_NO_SIGNALS is like
> disabling the interruption callback, which is not a desired behaviour except
> when this is blocking you for reproducing a test case.

For what it's worth, using |./mach debug| will automatically set the env variable to enable the workaround. I don't know if there's an equivalent for starting b2g in debug mode, but if that's the case, it might be worth it to enable it by default, for debugging purposes. Beware: I am not saying to enable it by default in *all* cases, but only for debugging.

See also bug 1005699 which tries to make JS_NO_SIGNALS (and such) easier to find.
We have a run-gdb.sh script that stets up the debugging environment on the phone. Maybe we can put it in there? Dave, Mike, can you look into this?
Flags: needinfo?(mwu)
Flags: needinfo?(dhylands)
Is the use of SIGSEGV required? If not, we can use one of the user signals and configure gdb to ignore it.
Flags: needinfo?(mwu)
(In reply to Michael Wu [:mwu] from comment #8)
> Is the use of SIGSEGV required? If not, we can use one of the user signals
> and configure gdb to ignore it.

We are not sending the SIGSEGV, we are catching it.  We mprotect the pages to prevent infinite loops in AsmJS and IonMonkey.  When we attempt to access the code pages, the system send a SIGSEGV to us.
Can we have the SIGSEGV handler rethrow a different exception if it's not "expected".

Then you could setup gdb to pass along SIGSEGV and have whatever gets rethrown cause it to stop.

Regular segmentation faults (i.e. buggy code) would then be reported as this rethrown signal (rather than a segv).

Perhaps we only rethrow a different exception when gdb is detected? That way, in theory, we don't change any significant behaviour between running under gdb/non-gdb.

And when we rethrow, we could print something which indicates that this is really a segfault.
Flags: needinfo?(dhylands)
(In reply to Dave Hylands [:dhylands] from comment #10)
> Can we have the SIGSEGV handler rethrow a different exception if it's not
> "expected".
> 
> Then you could setup gdb to pass along SIGSEGV and have whatever gets
> rethrown cause it to stop.
> 
> Regular segmentation faults (i.e. buggy code) would then be reported as this
> rethrown signal (rather than a segv).

I see one issue with that, which is that we have multiple SEGV handler at the moment.  So if we do rethrow at the end of the AsmJS handler, then we would have to move all the SEGV handlers to this other signal.  I don't know if we have much control over the order in which the handlers are attached / processed.

> And when we rethrow, we could print something which indicates that this is
> really a segfault.

I don't think IO are signal safe.
(In reply to Dave Hylands [:dhylands] from comment #10)
> Can we have the SIGSEGV handler rethrow a different exception if it's not "expected".

If you mean raise() or similar, then that will stop the program in the SIGSEGV handler, and I don't know if gdb knows how to unwind out of the signal trampoline to get back to the original fault context on all platforms of interest (especially B2G, where we're typically using an older version of gdb).
(In reply to Nicolas B. Pierron [:nbp] from comment #11)
> I don't think IO are signal safe.

write() is async signal safe, if all you need is a fixed string.  See also: https://code.google.com/p/chromium/codesearch#chromium/src/base/logging.cc&q=RawLog&sq=package:chromium&l=769
(In reply to Nicolas B. Pierron [:nbp] from comment #5)
> JS_NO_SIGNALS is like
> disabling the interruption callback, which is not a desired behaviour except
> when this is blocking you for reproducing a test case.

Actually, this is not entirely true: this doesn't disable the interrupt callback, it just avoids using signals tricks for the interrupt callback. Instead, it generates manual checks at the right places, in both Ion and Odin. So, with or without JS_NO_SIGNALS=1, you will have slow script alerts still show up.

I think the easiest solution is to set the env variable in run-gdb.sh, maybe with a warning that if an issue cannot be reproduced with run-gdb.sh, it might be due to the interrupt callback signal tricks being deactivated.
See Also: → 1049357
Presumably, it's the b2g app which needs to run with JS_NO_SIGNALS=1

which we can do in run-gdb.sh if run-gdb.sh actually starts b2g, but not if it attaches to the running process.

Adding some support in run-gdb.sh for JS_NO_SIGNALS (and printing a warning if its set to 1) seems pretty straight forward.
(In reply to Dave Hylands [:dhylands] from comment #16)
> Presumably, it's the b2g app which needs to run with JS_NO_SIGNALS=1
> 
> which we can do in run-gdb.sh if run-gdb.sh actually starts b2g, but not if
> it attaches to the running process.
> 
> Adding some support in run-gdb.sh for JS_NO_SIGNALS (and printing a warning
> if its set to 1) seems pretty straight forward.

Dave, do you have time to fix this?
Fare thee well, Firefox OS.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.