OdinMonkey: Crash [@ memmove] or [@ js::AsmJSModule::clone]

RESOLVED FIXED in mozilla30

Status

()

defect
--
critical
RESOLVED FIXED
6 years ago
5 years ago

People

(Reporter: gkw, Assigned: luke)

Tracking

(Blocks 2 bugs, {crash, regression, testcase})

Trunk
mozilla30
x86_64
macOS
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox29 unaffected, firefox30 affected)

Details

(Whiteboard: [jsbugmon:update])

Attachments

(2 attachments, 1 obsolete attachment)

Posted file stack
Function("\
    g = (function(t,foreign){\
        \"use asm\";\
        var ff = foreign.ff;\
        function f() {\
            +ff()\
        }\
        return f\
    })(this, {\
        ff: arguments.callee\
    }, ArrayBuffer(4096))\
")()
function m(f) {
    for (var j = 0; j < 9999; ++j) {
        f();
    }
}
m(g);

crashes js debug shell on m-c changeset b89a9d7b4ca0 without any CLI arguments with memmove and js::AsmJSModule::clone on the stack.

Locking s-s because memmove seems to be on the stack.

My configure flags are:

CC="clang -Qunused-arguments" AR=ar CXX="clang++ -Qunused-arguments" sh ./configure --target=x86_64-apple-darwin12.5.0 --enable-optimize --enable-debug --enable-profiling --enable-gczeal --enable-debug-symbols --enable-methodjit --enable-type-inference --disable-tests --enable-exact-rooting --with-ccache --enable-threadsafe <other NSPR options>

=== Tinderbox Build Bisection Results by autoBisect ===

The "good" changeset has the timestamp "20140218112849" and the hash "22693bfef4d8".
The "bad" changeset has the timestamp "20140218113752" and the hash "2a34429afff1".

Likely regression window: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=22693bfef4d8&tochange=2a34429afff1

Only bug 973725 is that regression range - setting needinfo? from Luke.
Flags: needinfo?(luke)
Oh, wow, great catch by the fuzzers!  While cloning, we touch the executable code.  If background ion compilation completes at the same time as we're cloning *while running the asm.js we're cloning*, it'll mprotect the code with PROT_NONE so we'll fault while doing the copy.  If this happens, the AsmJSFaultHandler will conservatively let the fault crash (since the pc isn't inside asm.js code).  This isn't s-s, since we'll safely crash if this occurs.
Group: core-security, javascript-core-security
Flags: needinfo?(luke)
Posted patch fix-mprotect (obsolete) — Splinter Review
Simple fix: leave READ|WRITE, just take away EXECUTE.
Assignee: nobody → luke
Status: NEW → ASSIGNED
Attachment #8379407 - Flags: review?(benj)
Comment on attachment 8379407 [details] [diff] [review]
fix-mprotect

Review of attachment 8379407 [details] [diff] [review]:
-----------------------------------------------------------------

Wow, fuzzers are powerful.

::: js/src/jit/AsmJSSignalHandlers.cpp
@@ +1035,3 @@
>          MOZ_CRASH();
>  #else  // assume Unix
> +    if (mprotect(module.codeBase(), module.functionBytes(), PROT_READ|PROT_WRITE))

As the clone code just needs to read the cloned code, we could just use PAGE_READONLY and PROT_READ. Testing locally with the same test case seems to Just Work. Is there a reason why we also need write access? If I understand correctly, the memory being written by the memcpy call in the clone function cannot be mprotected at this point, as this occurs only if there is an asmjs activation on the stack. Am I missing something?
Attachment #8379407 - Flags: review?(benj) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #3)
> to Just Work. Is there a reason why we also need write access?

I was thinking forward to the next fuzz bug :)  That is, when/if we add heap resizing, we'll need to do a bunch of patching (so writes) to an existing module which could be live on the stack and mprotected etc.
(In reply to Luke Wagner [:luke] from comment #4)
> I was thinking forward to the next fuzz bug :)  That is, when/if we add heap
> resizing, we'll need to do a bunch of patching (so writes) to an existing
> module which could be live on the stack and mprotected etc.

Ha, neat!
The problem is that if you remove only the executable bit, the simulator doesn't segfault anymore when we want to interrupt asm.js code. Not sure what's the best way to fix this...
Perhaps we can just not run this test on the ARM simulator.  What's the easiest way to do that?
Also likely caused a b2g emulator timeout:
https://tbpl.mozilla.org/php/getParsedLog.php?id=35060847&tree=Mozilla-Inbound&full=1

I expect it is likely the same thing as with the ARM simulator; noone wants to check execute permissions before executing every instruction.

(The b2g-win32 failure is confusing; it's not a timeout though, and it only fails with --no-fpu which disables Odin, so I'm guessing this is just brokenness in the Ion operation callback (which is hardly tested).)

So one thought is that we could take the operation callback lock when doing the clone so that we can't get this sort of interruption.  The only rub is that we need to also mprotect(PROT_READ|WRITE), once the lock is held, and then restore original permissions afterwards so we don't "lose" the trigger.
Posted patch fix-mprotectSplinter Review
Let's see how this one fares:
https://tbpl.mozilla.org/?tree=Try&rev=b90f18e6c482
Attachment #8379407 - Attachment is obsolete: true
Attachment #8379945 - Flags: review?(benj)
Try likes it.  SM(arm) didn't run (apparently try is still being configured), but I was able to repro the original hang locally and that new patch fixes it.
Comment on attachment 8379945 [details] [diff] [review]
fix-mprotect

Review of attachment 8379945 [details] [diff] [review]:
-----------------------------------------------------------------

Good. I thought it'd be nice to fix the simulators, but then I was like... whatever [rock riff].

::: js/src/jit/AsmJSModule.cpp
@@ +865,2 @@
>  bool
> +AsmJSModule::clone(JSContext *cx, ScopedJSDeletePtr<AsmJSModule> *moduleOut) const

Could we have an ExclusiveContext here?

::: js/src/jit/AsmJSModule.h
@@ +416,5 @@
>  
>      FunctionCountsVector                  functionCounts_;
>  
> +    // This field is accessed concurrently when triggering the operation
> +    // callback and access must be sychronized via the runtime's operation

nit: synchronized
Attachment #8379945 - Flags: review?(benj) → review+
(In reply to Benjamin Bouvier [:bbouvier] from comment #14)
Haha, nice.

> Could we have an ExclusiveContext here?

Unfortunately, ExclusiveContext doesn't expose a runtime() (because JSRuntime provides access to non-threadsafe stuff) which we need in several cases.  Since ExclusiveContext isn't actually necessary, I'd rather not add a special case (friend decl, new accessor, etc) to make this work.
https://hg.mozilla.org/mozilla-central/rev/4e7eba19b573
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
QA Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.