Closed Bug 858582 Opened 11 years ago Closed 11 years ago

Crash [@ js::EqualStrings] with ParallelArray

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla23
Tracking Status
firefox21 --- unaffected
firefox22 + fixed
firefox23 + verified
firefox-esr17 --- unaffected
b2g18 --- unaffected

People

(Reporter: decoder, Assigned: shu)

Details

(4 keywords, Whiteboard: [jsbugmon:][adv-main22-])

Crash Data

Attachments

(2 files)

Attached file Testcase for shell
The attached testcase crashes on mozilla-central revision 55f9e3e3dae7 (run with --ion-eager).
This test requires an optimized build with build options --enable-optimize --disable-debug --enable-gczeal --enable-valgrind. The crash looks like this:

Program received signal SIGSEGV, Segmentation fault.
js::EqualStrings (cx=0xc0d1b0, str1=<error reading variable: Cannot access memory at address 0x333333333333>, str2="u", result=0x7fffffff8b6f) at /srv/repos/mozilla-central/js/src/jsstr.cpp:4616
4616    }
(gdb) bt
#0  js::EqualStrings (cx=0xc0d1b0, str1=<error reading variable: Cannot access memory at address 0x333333333333>, str2="u", result=0x7fffffff8b6f) at /srv/repos/mozilla-central/js/src/jsstr.cpp:4616
#1  0x00000000004aec6e in js::LooselyEqual (cx=0xc0d1b0, lval=..., rval=..., result=0x7fffffff8b6f) at /srv/repos/mozilla-central/js/src/jsinterp.cpp:659
#2  0x000000000078a689 in js::ion::LooselyEqual<false> (cx=<optimized out>, lhs=..., rhs=..., res=0x7fffffff8b94) at /srv/repos/mozilla-central/js/src/ion/VMFunctions.cpp:189
#3  0x00007ffff7f91c84 in ?? ()
[...]
(gdb) x /i $pc
=> 0x52a79f <js::EqualStrings(JSContext*, JSString*, JSString*, bool*)+31>:     mov    (%rsi),%rdi
(gdb) info reg rsi
rsi            0x333333333333   56294995342131


Assuming at least sec-high based on the invalid read from 0x333333333333.
Keywords: sec-high
Whiteboard: [jsbugmon:update,bisect]
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:update]
JSBugMon: Bisection requested, result:
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   127434:e5978106c61a
parent:      127433:280e5ed3f0b7
parent:      125575:1d6fe70c79c5
user:        Jan de Mooij
date:        Thu Mar 21 11:23:12 2013 +0100
summary:     Merge from mozilla-central.

Not all ancestors of this changeset have been checked.
Use bisect --extend to continue the bisection from
the common ancestor, 0bba6ba92467.

This iteration took 125.995 seconds to run.

Oops! We didn't test rev 1d6fe70c79c5, a parent of the blamed revision! Let's do that now.
Rev 1d6fe70c79c5: Updating... Compiling... Testing... [Uninteresting] It didn't crash. (0.077 seconds)
good (not interesting) 
As expected, the parent's label is the opposite of the blamed rev's label.
Whiteboard: [jsbugmon:update] → [jsbugmon:update,ignore]
JSBugMon: The testcase found in this bug no longer reproduces (tried revision 50d25e083421).
Whiteboard: [jsbugmon:update,ignore] → [jsbugmon:bisectfix]
Whiteboard: [jsbugmon:bisectfix] → [jsbugmon:]
JSBugMon: Fix Bisection requested, result:
autoBisect shows this is probably related to the following changeset:

The first good revision is:
changeset:   129267:eb99b3a22c5c
user:        Kannan Vijayan
date:        Thu Apr 18 18:00:23 2013 -0400
summary:     Bug 861596 - Add Baseline stubs to handle GetProp(length) and GetElem(int32) operations on arguments objects. r=bhackett

This iteration took 125.466 seconds to run.
djvj, is the fix in comment 4 likely to fix this bug as well?
Doesn't seem related, no.  I think it's making the error go away incidentally, not fixing any pre-existing bug.
Some more info: I can reproduce on 49b7327cd49f only *with* --enable-optimize and *without* --enable-threadsafe. The presence of --enable-debug or --disable-debug doesn't seem to affect the crash.
Forgot to add, reproducible on both gcc and clang; both require --enable-optimize to crash.
Yet more info: this is somehow baseline related, as --ion-eager --no-baseline does not reproduce.
I think what's going on is that at the point 0.6 is observed, we're only invalidating one self-hosted:3575 instead of invalidating both self-hosted:3575 and self-hosted:2974 via propagating through the type constraints.

So I tried to debug this by setting INFERFLAGS=ops. But no! If INFERFLAGS=ops is set, we now magically invalidate both scripts.
Attached patch fixSplinter Review
While I'm still not 100% sure on what timing is needed to expose this bug, the fuzz test here hits a *very* specific GC timing to trigger it. I'll do my best to explain what happens.

First, some background on callsite cloning. There's a per-compartment table that keeps callsite clones. It maps keys in the form of triples (script, pc, function) to the clones. The idea here is that for any callsite-function pair, there should only be a single clone (no clones of clones).

Whether or not a function is cloned at callsite is a hint, so the cloned scripts must propagate type information back to the original script to ensure correctness of TI info. How it does this is that it installs subset constraints from the clone's argument, return, and this types back to the original.

What this test exposes is the following. During eager compilation, the we clone and compile ParallelArrayGet1. ParallelArrayMap, the caller of ParallelArrayGet1, hooks up a subset constrain For most of the execution of the test, the return type of it is string, as the array we constructed the parallel array out of is mostly strings. However, note the |(.6  )| on line 55. When execution gets to the point of passing .6 into the kernel function on line 56, we trip some kind of type guard. What's supposed to happen is that tripping the type guard of the return typeset propagates to the callsite, which invalidates the caller. Both get recompiled and we continue.

What actually happens it that GC happened between the cloning and the type monitor. Constraints were swept, but the scripts were not. During re-analysis of ParallelArrayMap, it hooks up the subset constraint again to the existing clone that it finds in per-compartment map. Due to a bug, we are actually executing a *clone of a clone* of ParallelArrayGet1. This usually would've gone unnoticed because as mentioned above, changes to the return typeset of the clone would propagate back to the original function, which then eventually propagates back to the caller. But in this particular case, the *constraints got swept*, and more importantly, re-analysis only re-added the constraints for the *first clone* (we should've never made clones of clones to begin with).

Due to this, when we trip the type guard, we only invalidate ParallelArrayGet1. No constraints are in place for it to propagate the information back to ParallelArrayMap. We then resume in the (incorrectly) still valid Ion code, and try to box a float value into a string-tagged JS::Value, and crash.
Assignee: general → shu
Attachment #741162 - Flags: review?(bhackett1024)
Attachment #741162 - Flags: review?(bhackett1024) → review+
https://hg.mozilla.org/mozilla-central/rev/e1255e1e6ad3
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Status: RESOLVED → VERIFIED
JSBugMon: This bug has been automatically verified fixed.
(In reply to Shu-yu Guo [:shu] from comment #9)
> Yet more info: this is somehow baseline related, as --ion-eager
> --no-baseline does not reproduce.

That may affect timing perhaps, but the original bug was found to occur in Firefox 22 (comment 2) and the patch does not appear to be in Baseline code. Don't we need an Aurora patch for this?

This bug should have requested sec-approval before landing so we could straighten out the affected branches and not potentially ship unpatched code.
(In reply to Daniel Veditz [:dveditz] from comment #15)
> (In reply to Shu-yu Guo [:shu] from comment #9)
> > Yet more info: this is somehow baseline related, as --ion-eager
> > --no-baseline does not reproduce.
> 
> That may affect timing perhaps, but the original bug was found to occur in
> Firefox 22 (comment 2) and the patch does not appear to be in Baseline code.
> Don't we need an Aurora patch for this?
> 
> This bug should have requested sec-approval before landing so we could
> straighten out the affected branches and not potentially ship unpatched code.

Yes, I believe you're right, this does need an Aurora patch. Requesting now.
Flags: needinfo?(shu)
Comment on attachment 741162 [details] [diff] [review]
fix

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 826148
User impact if declined: Possible exploit depending on GC timing
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): Low?
String or IDL/UUID changes made by this patch:
Attachment #741162 - Flags: approval-mozilla-aurora?
Attachment #741162 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Keywords: checkin-needed
Whiteboard: [jsbugmon:] → [jsbugmon:][adv-main22-]
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: