Closed Bug 858582 Opened 7 years ago Closed 7 years ago
Crash [@ js::Equal
Strings] with Parallel Array
The attached testcase crashes on mozilla-central revision 55f9e3e3dae7 (run with --ion-eager).
This test requires an optimized build with build options --enable-optimize --disable-debug --enable-gczeal --enable-valgrind. The crash looks like this: Program received signal SIGSEGV, Segmentation fault. js::EqualStrings (cx=0xc0d1b0, str1=<error reading variable: Cannot access memory at address 0x333333333333>, str2="u", result=0x7fffffff8b6f) at /srv/repos/mozilla-central/js/src/jsstr.cpp:4616 4616 } (gdb) bt #0 js::EqualStrings (cx=0xc0d1b0, str1=<error reading variable: Cannot access memory at address 0x333333333333>, str2="u", result=0x7fffffff8b6f) at /srv/repos/mozilla-central/js/src/jsstr.cpp:4616 #1 0x00000000004aec6e in js::LooselyEqual (cx=0xc0d1b0, lval=..., rval=..., result=0x7fffffff8b6f) at /srv/repos/mozilla-central/js/src/jsinterp.cpp:659 #2 0x000000000078a689 in js::ion::LooselyEqual<false> (cx=<optimized out>, lhs=..., rhs=..., res=0x7fffffff8b94) at /srv/repos/mozilla-central/js/src/ion/VMFunctions.cpp:189 #3 0x00007ffff7f91c84 in ?? () [...] (gdb) x /i $pc => 0x52a79f <js::EqualStrings(JSContext*, JSString*, JSString*, bool*)+31>: mov (%rsi),%rdi (gdb) info reg rsi rsi 0x333333333333 56294995342131 Assuming at least sec-high based on the invalid read from 0x333333333333.
Whiteboard: [jsbugmon:update,bisect] → [jsbugmon:update]
JSBugMon: Bisection requested, result: autoBisect shows this is probably related to the following changeset: The first bad revision is: changeset: 127434:e5978106c61a parent: 127433:280e5ed3f0b7 parent: 125575:1d6fe70c79c5 user: Jan de Mooij date: Thu Mar 21 11:23:12 2013 +0100 summary: Merge from mozilla-central. Not all ancestors of this changeset have been checked. Use bisect --extend to continue the bisection from the common ancestor, 0bba6ba92467. This iteration took 125.995 seconds to run. Oops! We didn't test rev 1d6fe70c79c5, a parent of the blamed revision! Let's do that now. Rev 1d6fe70c79c5: Updating... Compiling... Testing... [Uninteresting] It didn't crash. (0.077 seconds) good (not interesting) As expected, the parent's label is the opposite of the blamed rev's label.
Whiteboard: [jsbugmon:update] → [jsbugmon:update,ignore]
JSBugMon: The testcase found in this bug no longer reproduces (tried revision 50d25e083421).
Whiteboard: [jsbugmon:update,ignore] → [jsbugmon:bisectfix]
Whiteboard: [jsbugmon:bisectfix] → [jsbugmon:]
JSBugMon: Fix Bisection requested, result: autoBisect shows this is probably related to the following changeset: The first good revision is: changeset: 129267:eb99b3a22c5c user: Kannan Vijayan date: Thu Apr 18 18:00:23 2013 -0400 summary: Bug 861596 - Add Baseline stubs to handle GetProp(length) and GetElem(int32) operations on arguments objects. r=bhackett This iteration took 125.466 seconds to run.
djvj, is the fix in comment 4 likely to fix this bug as well?
Doesn't seem related, no. I think it's making the error go away incidentally, not fixing any pre-existing bug.
Some more info: I can reproduce on 49b7327cd49f only *with* --enable-optimize and *without* --enable-threadsafe. The presence of --enable-debug or --disable-debug doesn't seem to affect the crash.
Forgot to add, reproducible on both gcc and clang; both require --enable-optimize to crash.
Yet more info: this is somehow baseline related, as --ion-eager --no-baseline does not reproduce.
I think what's going on is that at the point 0.6 is observed, we're only invalidating one self-hosted:3575 instead of invalidating both self-hosted:3575 and self-hosted:2974 via propagating through the type constraints. So I tried to debug this by setting INFERFLAGS=ops. But no! If INFERFLAGS=ops is set, we now magically invalidate both scripts.
While I'm still not 100% sure on what timing is needed to expose this bug, the fuzz test here hits a *very* specific GC timing to trigger it. I'll do my best to explain what happens. First, some background on callsite cloning. There's a per-compartment table that keeps callsite clones. It maps keys in the form of triples (script, pc, function) to the clones. The idea here is that for any callsite-function pair, there should only be a single clone (no clones of clones). Whether or not a function is cloned at callsite is a hint, so the cloned scripts must propagate type information back to the original script to ensure correctness of TI info. How it does this is that it installs subset constraints from the clone's argument, return, and this types back to the original. What this test exposes is the following. During eager compilation, the we clone and compile ParallelArrayGet1. ParallelArrayMap, the caller of ParallelArrayGet1, hooks up a subset constrain For most of the execution of the test, the return type of it is string, as the array we constructed the parallel array out of is mostly strings. However, note the |(.6 )| on line 55. When execution gets to the point of passing .6 into the kernel function on line 56, we trip some kind of type guard. What's supposed to happen is that tripping the type guard of the return typeset propagates to the callsite, which invalidates the caller. Both get recompiled and we continue. What actually happens it that GC happened between the cloning and the type monitor. Constraints were swept, but the scripts were not. During re-analysis of ParallelArrayMap, it hooks up the subset constraint again to the existing clone that it finds in per-compartment map. Due to a bug, we are actually executing a *clone of a clone* of ParallelArrayGet1. This usually would've gone unnoticed because as mentioned above, changes to the return typeset of the clone would propagate back to the original function, which then eventually propagates back to the caller. But in this particular case, the *constraints got swept*, and more importantly, re-analysis only re-added the constraints for the *first clone* (we should've never made clones of clones to begin with). Due to this, when we trip the type guard, we only invalidate ParallelArrayGet1. No constraints are in place for it to propagate the information back to ParallelArrayMap. We then resume in the (incorrectly) still valid Ion code, and try to box a float value into a string-tagged JS::Value, and crash.
Assignee: general → shu
Attachment #741162 - Flags: review?(bhackett1024)
Attachment #741162 - Flags: review?(bhackett1024) → review+
JSBugMon: This bug has been automatically verified fixed.
(In reply to Shu-yu Guo [:shu] from comment #9) > Yet more info: this is somehow baseline related, as --ion-eager > --no-baseline does not reproduce. That may affect timing perhaps, but the original bug was found to occur in Firefox 22 (comment 2) and the patch does not appear to be in Baseline code. Don't we need an Aurora patch for this? This bug should have requested sec-approval before landing so we could straighten out the affected branches and not potentially ship unpatched code.
(In reply to Daniel Veditz [:dveditz] from comment #15) > (In reply to Shu-yu Guo [:shu] from comment #9) > > Yet more info: this is somehow baseline related, as --ion-eager > > --no-baseline does not reproduce. > > That may affect timing perhaps, but the original bug was found to occur in > Firefox 22 (comment 2) and the patch does not appear to be in Baseline code. > Don't we need an Aurora patch for this? > > This bug should have requested sec-approval before landing so we could > straighten out the affected branches and not potentially ship unpatched code. Yes, I believe you're right, this does need an Aurora patch. Requesting now.
Comment on attachment 741162 [details] [diff] [review] fix [Approval Request Comment] Bug caused by (feature/regressing bug #): 826148 User impact if declined: Possible exploit depending on GC timing Testing completed (on m-c, etc.): m-c Risk to taking this patch (and alternatives if risky): Low? String or IDL/UUID changes made by this patch:
Attachment #741162 - Flags: approval-mozilla-aurora?
Attachment #741162 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.