Closed Bug 1073911 Opened 10 years ago Closed 10 years ago

Crash [@ js::jit::Simulator::FlushICache] or [@ js::jit::PatchJump]

Categories

(Core :: JavaScript Engine: JIT, defect)

x86
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla36
Tracking Status
firefox35 --- affected
firefox36 --- fixed
firefox37 --- fixed

People

(Reporter: gkw, Unassigned)

References

Details

(Keywords: crash, regression, testcase)

Crash Data

Attachments

(3 files)

Attached file debug and opt stacks
The upcoming attached testcase crashes js debug and opt shell on m-c changeset 6a63bcb6e0d3 with --ion-eager --no-threads --arm-sim-icache-checks at js::jit::Simulator::FlushICache with js::jit::PatchJump on the stack.

My configure flags are:

LD=ld CROSS_COMPILE=1 CC="clang -Qunused-arguments -msse2 -mfpmath=sse -arch i386" RANLIB=ranlib CXX="clang++ -Qunused-arguments -msse2 -mfpmath=sse -arch i386" AS=$CC AR=ar STRIP="strip -x -S" HOST_CC="clang -Qunused-arguments -msse2 -mfpmath=sse" HOST_CXX="clang++ -Qunused-arguments -msse2 -mfpmath=sse" sh /Users/fuzz3/trees/mozilla-central/js/src/configure --target=i386-apple-darwin9.2.0 --enable-macos-target=10.5 --enable-arm-simulator --disable-debug --enable-optimize --enable-nspr-build --enable-more-deterministic --with-ccache --enable-gczeal --enable-debug-symbols --disable-tests
Debug stack: (top few frames)

  * frame #0: 0x004895ad js-dbg-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3`js::jit::Simulator::FlushICache(void*, unsigned long) [inlined] mozilla::ThreadLocal<js::PerThreadData*>::get() const + 29 at Simulator-arm.cpp:4377
    frame #1: 0x00489590 js-dbg-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3`js::jit::Simulator::FlushICache(start_addr=0x04a7b738, size=<unavailable>) + 64 at Simulator-arm.cpp:1103
    frame #2: 0x002a1b98 js-dbg-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3`js::jit::AutoFlushICache::flush(unsigned long, unsigned long) [inlined] js::jit::ExecutableAllocator::cacheFlush(code=0x04a7b738, size=4) + 120 at ExecutableAllocator.h:399

===

Opt stack: (top few frames)

  * frame #0: 0x002de7b2 js-dbgDisabled-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3`js::jit::Simulator::FlushICache(void*, unsigned long) [inlined] mozilla::ThreadLocal<js::PerThreadData*>::get(this=<unavailable>) const + 16 at Simulator-arm.cpp:4377
    frame #1: 0x002de7a2 js-dbgDisabled-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3`js::jit::Simulator::FlushICache(start_addr=0x04482d8c, size=4) + 34 at Simulator-arm.cpp:1103
    frame #2: 0x002b04c6 js-dbgDisabled-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3`js::jit::PatchJump(js::jit::CodeLocationJump&, js::jit::CodeLocationLabel) [inlined] js::jit::Assembler::RetargetNearBranch(offset=-536870458, cond=<unavailable>, final=<unavailable>) + 57 at Assembler-arm.cpp:2521
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/8ecc1aa20f39
user:        Douglas Crosher
date:        Sat Jul 12 10:04:29 2014 +1000
summary:     Bug 964258 - IonMonkey: Use mprotect for interrupt check on ARM. r=bhackett

Marty/Douglas, is bug 964258 a likely regressor?
Blocks: 964258
Flags: needinfo?(mrosenberg)
Flags: needinfo?(dtc-moz)
Or, is this more likely related to bug 988789, as per similar crash bug 1041079?
Crash Signature: [@ js::jit::Simulator::FlushICache] [@ js::jit::PatchJump]
This is the exact same bug as 988789, but along a different codepath.
Flags: needinfo?(mrosenberg)
This should fix the issue, but I haven't actually tested it beyond making sure it compiles.  I'm going to be super-intermittent next week, so if this fixes the issue, can you r? dougc on it?
Flags: needinfo?(gary)
Comment on attachment 8496533 [details] [diff] [review]
fixTlsSimCrash-r0.patch

Nope, running:

~/Desktop/js-dbg-opt-32-dm-nsprBuild-armSim-darwin-mozilla-central-6a63bcb6e0d3-207532-1073911-c6-658f7290e90c-J5eT8V.noindex/js-dbg-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3-1073911-c6-658f7290e90c --ion-eager --no-threads --arm-sim-icache-checks orig67.js

still intermittently causes:

Assertion failure: memcmp(reinterpret_cast<void*>(instr), cache_page->cachedData(offset), SimInstruction::kInstrSize) == 0, at /Users/skywalker/trees/mozilla-central/js/src/jit/arm/Simulator-arm.cpp:1086
Bus error: 10

I don't have a stack because running it in lldb seems to make the issue go away.
Attachment #8496533 - Flags: feedback-
Flags: needinfo?(gary)
I can't imagine that the stack would be that interesting, but can you get a core dump on OSX, and extract a stack from that?  I am not all that familiar with when threads exist, and when threadlocalstorage exists, but I feel like either we should either have no tls, and have never executed in the simulator, and thus no cache to flush, or we have something in the cache because we've executed instructions, and there is tls available.  Either that, or this crash is totally unrelated to the tls patch I posted earlier.
Attached file stack for assertion
Wait, I tried hard enough and here is the stack:

Top 5 frames:

  * frame #0: 0x00488c85 js-dbg-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3-1073911-c6-658f7290e90c`js::jit::Simulator::instructionDecode(js::jit::SimInstruction*) [inlined] js::jit::AutoLockSimulatorRuntime::AutoLockSimulatorRuntime(srt=<unavailable>) + 15 at Simulator-arm.cpp:419
    frame #1: 0x00488c76 js-dbg-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3-1073911-c6-658f7290e90c`js::jit::Simulator::instructionDecode(js::jit::SimInstruction*) [inlined] js::jit::AutoLockSimulatorRuntime::AutoLockSimulatorRuntime(srt=<unavailable>) at Simulator-arm.cpp:423
    frame #2: 0x00488c76 js-dbg-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3-1073911-c6-658f7290e90c`js::jit::Simulator::instructionDecode(this=<unavailable>, instr=<unavailable>) + 1270 at Simulator-arm.cpp:4104
    frame #3: 0x0049eefa js-dbg-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3-1073911-c6-658f7290e90c`void js::jit::Simulator::execute<false>(this=0x02383c00) + 138 at Simulator-arm.cpp:4188
    frame #4: 0x0048fc37 js-dbg-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3-1073911-c6-658f7290e90c`js::jit::Simulator::callInternal(this=0x02383c00, entry=0x01e458d8) + 231 at Simulator-arm.cpp:4276
Flags: needinfo?(mrosenberg)
Comment on attachment 8496533 [details] [diff] [review]
fixTlsSimCrash-r0.patch

Review of attachment 8496533 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/arm/Simulator-arm.cpp
@@ +1120,5 @@
>          return;
> +    PerThreadData *pt = TlsPerThreadData.get();
> +    // If there isn't TlsPerThreadData yet, then we haven't started the
> +    // simulator, so there is nothing in the cache that needs to be
> +    // invalidated.

Thank you for the quick patch. It looks like the same issue as in bug 1041079, a MacOS signal handler issue rather than the simulator having not yet started. See the comment on AutoFlushICache::flush(uintptr_t start, size_t len) in Ion.cpp:

// Note this can be called without TLS PerThreadData defined so this case needs
// to be guarded against. E.g. when patching instructions from the exception
// handler on MacOS running the ARM simulator.

The backtrace shows that it has already got to AutoFlushICache::flush(start=78100280, len=4) + 108 at Ion.cpp:3125 so there was either no PerThreadData or no AutoFlushICache. So when it gets to Simulator::FlushICache() it has a similar problem. There might be something in the cache that need invalidating?
(In reply to Gary Kwong [:gkw] [:nth10sd] GMT-4 Sep 29 - Oct 3 from comment #9)
> Created attachment 8496543 [details]
> stack for assertion
> 
> Wait, I tried hard enough and here is the stack:
> 
> Top 5 frames:
> 
>   * frame #0: 0x00488c85
> js-dbg-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3-1073911-c6-
> 658f7290e90c`js::jit::Simulator::instructionDecode(js::jit::SimInstruction*)
> [inlined]
> js::jit::AutoLockSimulatorRuntime::
> AutoLockSimulatorRuntime(srt=<unavailable>) + 15 at Simulator-arm.cpp:419
>     frame #1: 0x00488c76
> js-dbg-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3-1073911-c6-
> 658f7290e90c`js::jit::Simulator::instructionDecode(js::jit::SimInstruction*)
> [inlined]
> js::jit::AutoLockSimulatorRuntime::
> AutoLockSimulatorRuntime(srt=<unavailable>) at Simulator-arm.cpp:423
>     frame #2: 0x00488c76
> js-dbg-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3-1073911-c6-
> 658f7290e90c`js::jit::Simulator::instructionDecode(this=<unavailable>,
> instr=<unavailable>) + 1270 at Simulator-arm.cpp:4104
>     frame #3: 0x0049eefa
> js-dbg-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3-1073911-c6-
> 658f7290e90c`void js::jit::Simulator::execute<false>(this=0x02383c00) + 138
> at Simulator-arm.cpp:4188
>     frame #4: 0x0048fc37
> js-dbg-opt-32-dm-nsprBuild-armSim-darwin-6a63bcb6e0d3-1073911-c6-
> 658f7290e90c`js::jit::Simulator::callInternal(this=0x02383c00,
> entry=0x01e458d8) + 231 at Simulator-arm.cpp:4276

I'm rather confused.  The backtrace here firmly looks like it is for an issue locking the simulator, but the message printed out is:
> Assertion failure: memcmp(reinterpret_cast<void*>(instr), cache_page->cachedData(offset), SimInstruction::kInstrSize) == 0, at
Flags: needinfo?(mrosenberg)
(In reply to Douglas Crosher [:dougc] from comment #10)

> The backtrace shows that it has already got to

Oh hey, there are backtraces *attached* to the bug report, not just in the comments.
(In reply to Marty Rosenberg [:mjrosenb] from comment #12)
> Oh hey, there are backtraces *attached* to the bug report, not just in the
> comments.

Yeah, I got feedback that it'll be nice to have both the full backtraces attached *and* part of the same backtrace in the comments to support different JS dev workflows. (so some people can view the backtrace in bugmail and not have to click an additional time)
Marty, what's next here?
Flags: needinfo?(mrosenberg)
Is this still a problem? I thought luke fixed this in bug 1091912 by reworking the code to avoid the use of mprotect to halt Ion/asm.js execution.
Flags: needinfo?(dtc-moz)
Douglas, you're right:

autoBisect shows this is probably related to the following changeset:

The first good revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/7db30249d1d8
user:        Luke Wagner
date:        Tue Nov 11 08:36:52 2014 -0600
summary:     Bug 1091912 - stop using mprotect to halt Ion/asm.js execution (r=bhackett)

Marking RESOLVED FIXED by bug 1091912.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(mrosenberg)
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: