Closed Bug 1117165 Opened 9 years ago Closed 9 years ago

Crash [@ js::jit::MDefinition::valueHash] or Assertion failure: producer_ != nullptr, at jit/MIR.h

Categories

(Core :: JavaScript Engine, defect)

x86_64
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
firefox34 --- unaffected
firefox35 --- wontfix
firefox36 + fixed
firefox37 + fixed
firefox38 + fixed
firefox-esr31 --- unaffected

People

(Reporter: azakai, Assigned: sunfish)

References

Details

(5 keywords, Whiteboard: [jsbugmon:update,testComment=5,origRev=13fe5ad0364d])

Crash Data

Attachments

(5 files, 1 obsolete file)

Attached file a.out.js (obsolete) —
I found the attached testcase after 48 hours of fuzzing emscripten using LLVM 3.5. It segfaults in spidermonkey, but works ok with --no-asmjs (also works ok in node).

gdb says

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff67ed700 (LWP 31427)]
0x000000000066ff46 in addU32ToHash (data=<error reading variable: Cannot access memory at address 0x20>, hash=225)
    at /home/alon/Dev/mozilla-inbound/js/src/jit/MIR.cpp:211
211	    return data + (hash << 6) + (hash << 16) - hash;
(gdb) where
#0  0x000000000066ff46 in addU32ToHash (data=<error reading variable: Cannot access memory at address 0x20>, hash=225)
    at /home/alon/Dev/mozilla-inbound/js/src/jit/MIR.cpp:211
#1  js::jit::MDefinition::valueHash (this=0x18aef60) at /home/alon/Dev/mozilla-inbound/js/src/jit/MIR.cpp:219
#2  0x00000000006e18ec in hash (ins=0x18aef60) at /home/alon/Dev/mozilla-inbound/js/src/jit/ValueNumbering.cpp:46
#3  prepareHash (l=<optimized out>) at ../../dist/include/js/HashTable.h:1075
#4  lookupForAdd (l=<optimized out>, this=0x7ffff67ecd50) at ../../dist/include/js/HashTable.h:1561
#5  lookupForAdd (l=<optimized out>, this=0x7ffff67ecd50) at ../../dist/include/js/HashTable.h:372
#6  findLeaderForAdd (def=0x18aef60, this=0x7ffff67ecd50) at /home/alon/Dev/mozilla-inbound/js/src/jit/ValueNumbering.cpp:97
#7  js::jit::ValueNumberer::leader (this=this@entry=0x7ffff67ecd40, def=def@entry=0x18aef60)
    at /home/alon/Dev/mozilla-inbound/js/src/jit/ValueNumbering.cpp:625
#8  0x00000000006eea4b in js::jit::ValueNumberer::visitDefinition (this=0x7ffff67ecd40, def=0x18aef60)
    at /home/alon/Dev/mozilla-inbound/js/src/jit/ValueNumbering.cpp:773
#9  0x00000000006ef1c5 in js::jit::ValueNumberer::visitBlock (this=this@entry=0x7ffff67ecd40, block=block@entry=0x18b6b08, 
    dominatorRoot=dominatorRoot@entry=0x189c1f8) at /home/alon/Dev/mozilla-inbound/js/src/jit/ValueNumbering.cpp:933
#10 0x00000000006ef52e in js::jit::ValueNumberer::visitDominatorTree (this=this@entry=0x7ffff67ecd40, 
    dominatorRoot=dominatorRoot@entry=0x189c1f8) at /home/alon/Dev/mozilla-inbound/js/src/jit/ValueNumbering.cpp:978
#11 0x00000000006ef642 in js::jit::ValueNumberer::visitGraph (this=this@entry=0x7ffff67ecd40)
    at /home/alon/Dev/mozilla-inbound/js/src/jit/ValueNumbering.cpp:1016
#12 0x00000000006ef6cb in js::jit::ValueNumberer::run (this=this@entry=0x7ffff67ecd40, 
    updateAliasAnalysis=updateAliasAnalysis@entry=js::jit::ValueNumberer::DontUpdateAliasAnalysis)
    at /home/alon/Dev/mozilla-inbound/js/src/jit/ValueNumbering.cpp:1087
#13 0x0000000000654de2 in js::jit::OptimizeMIR (mir=0x189c158) at /home/alon/Dev/mozilla-inbound/js/src/jit/Ion.cpp:1495
#14 0x000000000081f99c in js::HelperThread::handleAsmJSWorkload (this=this@entry=0x1708048)
    at /home/alon/Dev/mozilla-inbound/js/src/vm/HelperThreads.cpp:1007
#15 0x00000000008208b8 in js::HelperThread::threadLoop (this=0x1708048) at /home/alon/Dev/mozilla-inbound/js/src/vm/HelperThreads.cpp:1369
#16 0x0000000000879379 in nspr::Thread::ThreadRoutine (arg=0x170cdb0) at /home/alon/Dev/mozilla-inbound/js/src/vm/PosixNSPR.cpp:45
#17 0x00007ffff7bc4182 in start_thread (arg=0x7ffff67ed700) at pthread_create.c:312
#18 0x00007ffff6cb3efd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Marking security sensitive just to be safe.
Confirmed that I see the crash in the browser too, not just in the shell.
CC'ing sunfish because I'm not sure he has access yet.
In debug builds this asserts in MUse::producer:

MOZ_ASSERT(producer_ != nullptr);

We're under MAsmJSLoadHeap::congruentTo -> congruentIfOperandsEqual, getting the first operand.
[Tracking Requested - why for this release]:
Crash.

Gary, is it easy for you to run autoBisect on this?
Flags: needinfo?(gary)
Flags: needinfo?(gary)
(function(a, b, c) {
    "use asm";
    var m = new a.Int32Array(c);
    function f() {
        var x = 0;
        x = m[0] | 0;
        while (1) {
            if (0 ? x : 0) {
                x = 1;
            }
        }
    }
})()

asserts js debug shell on m-c rev 13fe5ad0364d with --fuzzing-safe --no-threads --ion-eager at Assertion failure: producer_ != nullptr, at jit/MIR.h

Debug configure options:

CC="clang -Qunused-arguments" CXX="clang++ -Qunused-arguments" AR=ar AUTOCONF=/usr/local/Cellar/autoconf213/2.13/bin/autoconf213 sh /Users/skywalker/trees/mozilla-central/js/src/configure --target=x86_64-apple-darwin12.5.0 --enable-debug --enable-optimize --enable-nspr-build --enable-more-deterministic --with-ccache --enable-gczeal --enable-debug-symbols --disable-tests

Running autoBisect now.
Severity: normal → critical
OS: Linux → All
Summary: Segfault in hash code when running an asm.js file → Assertion failure: producer_ != nullptr, at jit/MIR.h
Whiteboard: [jsbugmon:update,testComment=5,origRev=13fe5ad0364d]
Version: unspecified → Trunk
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/8f27a48a25d5
user:        Dan Gohman
date:        Wed Sep 17 10:27:25 2014 -0700
summary:     Bug 1029830 - IonMonkey: GVN: Replace UCE with GVN r=nbp

Dan, is bug 1029830 a likely regressor?

The testcase in comment 5 crashes opt shell at js::jit::MDefinition::valueHash.
Blocks: 1029830
Crash Signature: [@ js::jit::MDefinition::valueHash]
Flags: needinfo?(sunfish)
Keywords: crash
Summary: Assertion failure: producer_ != nullptr, at jit/MIR.h → Crash [@ js::jit::MDefinition::valueHash] or Assertion failure: producer_ != nullptr, at jit/MIR.h
Attached file stack
(lldb) bt 5
* thread #1: tid = 0xaf1ed, 0x0000000100297c94 js-dbgDisabled-opt-64-dm-nsprBuild-darwin-13fe5ad0364d`js::jit::MDefinition::valueHash() const [inlined] js::jit::MDefinition::addU32ToHash(this=0x0000000000000000, hash=<unavailable>) + 2 at MIR.cpp:211, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x20)
  * frame #0: 0x0000000100297c94 js-dbgDisabled-opt-64-dm-nsprBuild-darwin-13fe5ad0364d`js::jit::MDefinition::valueHash() const [inlined] js::jit::MDefinition::addU32ToHash(this=0x0000000000000000, hash=<unavailable>) + 2 at MIR.cpp:211
    frame #1: 0x0000000100297c92 js-dbgDisabled-opt-64-dm-nsprBuild-darwin-13fe5ad0364d`js::jit::MDefinition::valueHash(this=0x00000001018a4f78) const + 66 at MIR.cpp:219
    frame #2: 0x00000001002f1e3e js-dbgDisabled-opt-64-dm-nsprBuild-darwin-13fe5ad0364d`js::jit::ValueNumberer::leader(js::jit::MDefinition*) [inlined] js::jit::ValueNumberer::VisibleValues::ValueHasher::hash(ins=<unavailable>) + 78 at ValueNumbering.cpp:46
    frame #3: 0x00000001002f1e32 js-dbgDisabled-opt-64-dm-nsprBuild-darwin-13fe5ad0364d`js::jit::ValueNumberer::leader(js::jit::MDefinition*) [inlined] js::detail::HashTable<js::jit::MDefinition* const, js::HashSet<js::jit::MDefinition*, js::jit::ValueNumberer::VisibleValues::ValueHasher, js::jit::JitAllocPolicy>::SetOps, js::jit::JitAllocPolicy>::prepareHash(l=0x00000001018a4f78) at HashTable.h:1075
    frame #4: 0x00000001002f1e32 js-dbgDisabled-opt-64-dm-nsprBuild-darwin-13fe5ad0364d`js::jit::ValueNumberer::leader(js::jit::MDefinition*) [inlined] js::detail::HashTable<js::jit::MDefinition* const, js::HashSet<js::jit::MDefinition*, js::jit::ValueNumberer::VisibleValues::ValueHasher, js::jit::JitAllocPolicy>::SetOps, js::jit::JitAllocPolicy>::lookupForAdd(this=0x00007fff5fbfb840, l=0x00000001018a4f78) const at HashTable.h:1561
(lldb) dis -p
js-dbgDisabled-opt-64-dm-nsprBuild-darwin-13fe5ad0364d`js::jit::MDefinition::valueHash() const + 68 [inlined] js::jit::MDefinition::addU32ToHash(unsigned int, unsigned int) + 2 at MIR.cpp:219
js-dbgDisabled-opt-64-dm-nsprBuild-darwin-13fe5ad0364d`js::jit::MDefinition::valueHash() const + 66 at MIR.cpp:219:
-> 0x100297c94:  addl   0x20(%rax), %ebx
   0x100297c97:  incq   %r12
   0x100297c9a:  movl   %ebx, %ebp
   0x100297c9c:  negl   %ebp
(lldb) register read $rax
     rax = 0x0000000000000000
(lldb) register read $ebx
     ebx = 0x00e1375f
(lldb)

This might even be accessing 0x00e1375f?
Comment on attachment 8543337 [details]
a.out.js

See comment 5 for the smaller testcase, though we should also probably retest with this testcase.
Attachment #8543337 - Attachment is obsolete: true
The attached patch fixes the bug, both in the reduced testcase and in the original a.out.js. I'll request a review once I've convinced myself that this is indeed how I want to fix it. I can confirm that this bug is likely to date back to the patch identified in the bisection.

On the topic of security implications, I haven't thought deeply about this yet, but I don't currently see how this could be anything more severe than a null page dereference. This is because the bug entails the optimizer discarding an unused instruction, and then attempting to do additional optimizations on the discarded instruction. That crashes because discarded instructions have their operands nulled out (even in release builds). It doesn't even lead to live instructions having dangling operands, because the bug only happens for values which are unused to begin with.

In the disassembly above, "addl   0x20(%rax), %ebx" is loading the value from 0x20 + %rax and adding it to the value in %ebx, and storing the result in %ebx. The only memory access is the one at 0x20 + %rax, and %rax is null, so this is just a null page dereference.
Assignee: nobody → sunfish
Flags: needinfo?(sunfish)
Group: core-security
Keywords: csectype-dos
Attached file fc.out.js
Here's another testcase the fuzzer found last night. It produces the same assertion in a debug build, so it seems likely it's the same issue.
Attached file another testcase
Here's another testcase the fuzzer found over the weekend, with the same behavior.
Dan, can we land your patch? :)
Flags: needinfo?(sunfish)
Refreshed patch. Yes, I think this is the right fix now.
Flags: needinfo?(sunfish)
Attachment #8560499 - Flags: review?(nicolas.b.pierron)
Looks like we should be tracking 36+ at this point. If the fix can be verified on Nightly by Monday and Dan confirms that this is safe enough to uplift to Beta, we may be able to fix this in 36.
Attachment #8560499 - Flags: review?(nicolas.b.pierron) → review+
https://hg.mozilla.org/mozilla-central/rev/133c9c6c0c29
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Dan, could you fill the uplift request to aurora & beta? Thanks
Flags: needinfo?(sunfish)
Comment on attachment 8544205 [details] [diff] [review]
gvn-discarding-def-discards-simplified.patch

Approval Request Comment
[Feature/regressing bug #]: bug 1029830
[User impact if declined]: may hit null-pointer exceptions in rare circumstances
[Describe test coverage new/current, TreeHerder]: TreeHerder
[Risks and why]: pretty low risk; it's a simple patch and it's just restricting an existing optimization to keep it out of trouble
[String/UUID change made/needed]: none
Flags: needinfo?(sunfish)
Attachment #8544205 - Flags: approval-mozilla-beta?
Attachment #8544205 - Flags: approval-mozilla-aurora?
Attachment #8544205 - Flags: approval-mozilla-beta?
Attachment #8544205 - Flags: approval-mozilla-beta+
Attachment #8544205 - Flags: approval-mozilla-aurora?
Attachment #8544205 - Flags: approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: