Assertion failure: !(*__errno_location ()), at js/src/threading/posix/Mutex.cpp:72 - simulator problem

RESOLVED DUPLICATE of bug 1285074

Status

()

Core
JavaScript Engine
--
critical
RESOLVED DUPLICATE of bug 1285074
a year ago
11 months ago

People

(Reporter: decoder, Unassigned)

Tracking

(Blocks: 1 bug, {assertion, regression, testcase})

Trunk
mozilla51
ARM
Linux
assertion, regression, testcase
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox51 fixed)

Details

(Whiteboard: [jsbugmon:update,ignore])

Attachments

(1 attachment)

(Reporter)

Description

a year ago
The following testcase crashes on mozilla-central revision 6cf0089510fa (build with --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --enable-debug --without-intl-api --enable-optimize --target=i686-pc-linux-gnu --enable-simulator=arm, run with --fuzzing-safe --thread-count=2 --ion-eager --arm-sim-icache-checks):

timeout(5);
function f0(p0, p1, p2, p3, p4, p5, target, p7, p8, p9) {
    do {
        v0 = (p0 | p7);
    } while (v0);
}
f0(2204, 465, 7905, 3902, 4658, 4110, 5703, 2199, 2681, 5291);



Backtrace:

 received signal SIGSEGV, Segmentation fault.
0x08149dc6 in js::Mutex::lock (this=0xf7927988) at js/src/threading/posix/Mutex.cpp:71
#0  0x08149dc6 in js::Mutex::lock (this=0xf7927988) at js/src/threading/posix/Mutex.cpp:71
#1  0x0849ad2c in js::LockGuard<js::Mutex>::LockGuard (aLock=..., this=<synthetic pointer>) at js/src/threading/LockGuard.h:25
#2  js::jit::AutoLockSimulatorCache::AutoLockSimulatorCache (sim=0xf7927000, this=<synthetic pointer>) at js/src/jit/arm/Simulator-arm.cpp:369
#3  js::jit::Simulator::FlushICache (start_addr=0xf7bd6aac, size=4) at js/src/jit/arm/Simulator-arm.cpp:1072
#4  0x082557a2 in js::jit::ExecutableAllocator::cacheFlush (size=4, code=0xf7bd6aac) at js/src/jit/ExecutableAllocator.h:245
#5  js::jit::AutoFlushICache::flush (start=4156385964, len=4) at js/src/jit/Ion.cpp:3384
#6  0x0842d01b in js::jit::Assembler::RetargetNearBranch (i=0xf7bd6aac, offset=336, cond=js::jit::Assembler::NotEqual, final=true) at js/src/jit/arm/Assembler-arm.cpp:3028
#7  0x0842d27e in js::jit::PatchJump (jump_=..., label=..., reprotect=<optimized out>) at js/src/jit/arm/Assembler-arm.cpp:638
#8  0x08250a59 in js::jit::PatchBackedge (target=js::jit::JitRuntime::BackedgeInterruptCheck, label=..., jump_=...) at js/src/jit/arm/Assembler-arm.h:1174
#9  js::jit::JitRuntime::patchIonBackedges (this=0xf790a1e0, rt=0xf79480e8, target=js::jit::JitRuntime::BackedgeInterruptCheck) at js/src/jit/Ion.cpp:415
#10 0x083b02ea in RedirectIonBackedgesToInterruptCheck (rt=0xf79480e8) at js/src/asmjs/WasmSignalHandlers.cpp:1225
#11 RedirectJitCodeToInterruptCheck (context=0xffffb4cc, rt=0xf79480e8) at js/src/asmjs/WasmSignalHandlers.cpp:1234
#12 JitInterruptHandler (signum=26, info=0xffffb44c, context=0xffffb4cc) at js/src/asmjs/WasmSignalHandlers.cpp:1273
#13 <signal handler called>
#14 0x08496ab2 in js::jit::CheckICacheLocked (instr=0xf7bd6aac, i_cache=...) at js/src/jit/arm/Simulator-arm.cpp:1034
#15 js::jit::Simulator::instructionDecode (this=0xf7927000, instr=0xf7bd6aac) at js/src/jit/arm/Simulator-arm.cpp:4518
[...]
#43 main (argc=6, argv=0xffffcc14, envp=0xffffcc30) at js/src/shell/js.cpp:7516
eax	0x0	0
ebx	0x8befff4	146735092
ecx	0xf7d9c864	-136722332
edx	0x0	0
esi	0x23	35
edi	0xf7927000	-141398016
ebp	0xffffb2e8	4294947560
esp	0xffffb2e0	4294947552
eip	0x8149dc6 <js::Mutex::lock()+134>
=> 0x8149dc6 <js::Mutex::lock()+134>:	movl   $0x0,0x0
   0x8149dd0 <js::Mutex::lock()+144>:	ud2    


Not sure what this assertion is actually about, marking s-s until triaged by JS team.

Updated

a year ago
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:update]
Comment hidden (obsolete)

Updated

a year ago
Whiteboard: [jsbugmon:update] → [jsbugmon:update,ignore]

Comment 2

a year ago
JSBugMon: The testcase found in this bug no longer reproduces (tried revision 0502bd9e025e).
Dan, could you look into this? It seems like it probably isn't a real security issue because it is inside the simulator but it would be nice to confirm that.
Flags: needinfo?(sunfish)
The bug appears limited to --arm-sim-icache-checks mode. Without that flag, the code doesn't attempt to do any of the relevant locking, so it shouldn't hit this.

Summary: In --arm-sim-icache-checks mode, the main thread takes a lock, a signal arrives, and the signal handler tries to take the same lock, and an EDEADLK occurs.

More detail: The main thread enters Simulator::instructionDecode as normal, and in --arm-sim-icache-checks mode takes a lock and calls CheckICacheLocked. In the middle of that, the watchdog thread sends the main thread a SIGVTALRM, and then the signal handler on the main thread patches up some instructions, and in --arm-sim-icache-checks mode tries to take the same lock so that it can call Simulator::FlushICache. Since the lock is already held, by the same thread, this triggers an EDEADLK error. (It also happens to set errno, which is what actually triggers the assert, but that's just a coincidence.)
Flags: needinfo?(sunfish)
I'll unhide this then because it sounds like a simulator bug. Thanks for looking.
Group: javascript-core-security
Is this actually a regression, or just stars aligning for it to happen?
Summary: Assertion failure: !(*__errno_location ()), at js/src/threading/posix/Mutex.cpp:72 → Assertion failure: !(*__errno_location ()), at js/src/threading/posix/Mutex.cpp:72 - simulator problem
status-firefox51: affected → fix-optional

Comment 7

a year ago
I found this today while fuzzing. I see the comment from Fuzzing Team says the test case no longer reproduces. Is it just that it was tried without the "--arm-sim-icache-checks" flag, or does it really no longer reproduce? Since I just found this today with the latest mozilla-central, I could submit my test case if you need a test case that currently causes this issue. It shows up as being a Q10 in FuzzManager.
(In reply to joseph.bisch from comment #7)
> It shows up as being a Q10 in FuzzManager.

If you ran jsfunfuzz via the funfuzz repo on this, Q10 means that it was intermittent (reproduced once, but subsequent attempts at reduction failed to reproduce)
timeout(0.01);
do {
    x = 999;
} while (x);

Here's a reduced testcase that will reproduce with m-c rev 394f02edb7cb with --fuzzing-safe --thread-count=2 --ion-eager --arm-sim-icache-checks with the following build:

LD=ld CROSS_COMPILE=1 CC="clang -Qunused-arguments -msse2 -mfpmath=sse -arch i386" RANLIB=ranlib CXX="clang++ -Qunused-arguments -msse2 -mfpmath=sse -arch i386" AS=$CC AR=ar STRIP="strip -x -S" HOST_CC="clang -Qunused-arguments -msse2 -mfpmath=sse" AUTOCONF=/usr/local/Cellar/autoconf213/2.13/bin/autoconf213 HOST_CXX="clang++ -Qunused-arguments -msse2 -mfpmath=sse" sh /Users/skywalker/trees/mozilla-central/js/src/configure --target=i386-apple-darwin14.5.0 --enable-simulator=arm --disable-jemalloc --enable-debug --enable-more-deterministic --with-ccache --enable-gczeal --enable-debug-symbols --disable-tests

Note that the testcase is intermittent.

Dan, how should we proceed?
Flags: needinfo?(sunfish)
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/73cc7590bc39
user:        Nick Fitzgerald
date:        Thu Jun 30 13:13:30 2016 -0700
summary:     Bug 1283278 - Use js::Mutex instead of PRLock in js/src/jit/arm/Simulator-arm.{h,cpp}; r=terrence

Also, Nick, this points to bug 1283278, is this a likely regressor?
Blocks: 1283278
Flags: needinfo?(nfitzgerald)
Our new threading primitives have checks and assertions against deadlocks that we did not used to have. This is hitting those assertions, so it was a dormant bug, but it is also hitting an assertion regarding errno that we can't rely on to hold since no one else ever checks/resets errno.

I can remove the errno assertion, but mostly what we are seeing here is an existing deadlock in the arm simulator that we have new insight into because of the new assertions and checks.

Leaving ni? to remind myself to remove the errno assertion.
Removing the errno assertion may be all we need to do here. Ion is modifying the instruction stream in response to an asynchronous signal, which is something the checking code does not appear to be expecting, but it's deliberate.
Flags: needinfo?(sunfish)
Created attachment 8789472 [details] [diff] [review]
Don't assert that errno hasn't been set

In general, tons of code in mozilla-central doesn't check and/or unset errno, so
we can't rely on it being unset here.
Attachment #8789472 - Flags: review?(terrence)
Flags: needinfo?(nfitzgerald)
Comment on attachment 8789472 [details] [diff] [review]
Don't assert that errno hasn't been set

Review of attachment 8789472 [details] [diff] [review]:
-----------------------------------------------------------------

Yup, sorry about that. I'm a bit shocked that it took this long to find a crashing test case.
Attachment #8789472 - Flags: review?(terrence) → review+

Comment 15

a year ago
Pushed by nfitzgerald@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7463c762e719
Don't assert that errno hasn't been set; r=terrence

Comment 16

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/7463c762e719
Status: NEW → RESOLVED
Last Resolved: a year ago
status-firefox51: fix-optional → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
That patch didn't fix the simulator's deadlock, it will still deadlock and we will still crash upon detection of the deadlock.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Dan Gohman [:sunfish] from comment #12)
> Removing the errno assertion may be all we need to do here. Ion is modifying
> the instruction stream in response to an asynchronous signal, which is
> something the checking code does not appear to be expecting, but it's
> deliberate.

Calling into pthreads, or even taking any kind of lock, is well outside the bounds of what is signal safe.
In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #18)
> Calling into pthreads, or even taking any kind of lock, is well outside the
> bounds of what is signal safe.

The code that's calling into pthreads and taking a lock is a debugging mode in the ARM simulator code. What IonMonkey is doing appears to be outside the bounds of what this debugging mode is expecting to handle.
(In reply to Dan Gohman [:sunfish] from comment #19)
> In reply to Nick Fitzgerald [:fitzgen] [⏰PDT; UTC-7] from comment #18)
> > Calling into pthreads, or even taking any kind of lock, is well outside the
> > bounds of what is signal safe.
> 
> The code that's calling into pthreads and taking a lock is a debugging mode
> in the ARM simulator code. What IonMonkey is doing appears to be outside the
> bounds of what this debugging mode is expecting to handle.

Then the debugging mode in the ARM simulator has a bug. I'm not sure what point you're trying to make?
:fitzgen, what's next here?
Flags: needinfo?(nfitzgerald)
Whoever owns and understands the ARM simulator fixes its debugging mode deadlock bug. That person is unfortunately not me.

<fitzgen> mrgiggles: who can review Simulator-arm.cpp
<mrgiggles> fitzgen: jandem x 22, bbouvier x 13, luke x 10
Flags: needinfo?(nfitzgerald)
Passing the ball to Sean, who wrote ARM stuff recently.
Flags: needinfo?(sstangl)
This is probably the same as Bug 1285074.
Shall we dupe this to that one?

Comment 26

11 months ago
Verified that the testcase no longer crashes. Duping.
Status: REOPENED → RESOLVED
Last Resolved: a year ago11 months ago
Flags: needinfo?(sstangl)
Resolution: --- → DUPLICATE
Duplicate of bug: 1285074
You need to log in before you can comment on or make changes to this bug.